Skip to content

Proxy for Chrome not working anymore #1265

@maidou-00

Description

@maidou-00

Hello everyone. Just recently upgraded to the V0.23.2(nightly build), and the proxy for Chrome is not working anymore... I live in an internet-censored place and hence proxy is a must.

Here are my config:

chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    container_name: Hoarder-CHROME
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
      - --proxy-server='https=172.21.0.1:1080'   # Note: it was - --proxy-server=172.21.0.1:1080 and it was working before

Also tried redeploying/restart, the usual drills.

Logs(trying to access Google):

2025-04-14T10:20:22.380Z info: [Crawler][2909] Will crawl "https://www.google.com" for link with id "ohywrfgs6c5l93scrd61t6hk"

2025-04-14T10:20:22.380Z info: [Crawler][2909] Attempting to determine the content-type for the url https://www.google.com

2025-04-14T10:20:27.382Z error: [Crawler][2909] Failed to determine the content-type for the url https://www.google.com: AbortError: The operation was aborted.

2025-04-14T10:22:27.492Z error: [Crawler][2909] Crawling job failed: TimeoutError: Navigation timeout of 120000 ms exceeded

TimeoutError: Navigation timeout of 120000 ms exceeded

    at new Deferred (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:59:34)

    at Deferred.create (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:21:16)

    at new LifecycleWatcher (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/LifecycleWatcher.js:65:60)

    at CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:136:29)

    at CdpFrame.<anonymous> (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/decorators.js:98:27)

    at CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:43)

    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:2115)

    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:9435)

    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:13098)

2025-04-14T10:22:30.921Z info: [Crawler][2909] Will crawl "https://www.google.com" for link with id "ohywrfgs6c5l93scrd61t6hk"

2025-04-14T10:22:30.922Z info: [Crawler][2909] Attempting to determine the content-type for the url https://www.google.com

2025-04-14T10:22:35.924Z error: [Crawler][2909] Failed to determine the content-type for the url https://www.google.com: AbortError: The operation was aborted.

2025-04-14T10:24:36.035Z error: [Crawler][2909] Crawling job failed: TimeoutError: Navigation timeout of 120000 ms exceeded

TimeoutError: Navigation timeout of 120000 ms exceeded

    at new Deferred (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:59:34)

    at Deferred.create (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:21:16)

    at new LifecycleWatcher (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/LifecycleWatcher.js:65:60)

    at CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:136:29)

    at CdpFrame.<anonymous> (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/decorators.js:98:27)

    at CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:43)

    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:2115)

    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:9435)

    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:13098)

update

With proxy variable set to - --proxy-server=172.21.0.1:1080, the logs are as following. It seemed like the only problem is "Failed to determine the content-type for the url https://www.google.com", Chrome is able to navigate and read, but just cannot crawl successfully

2025-04-14T10:49:09.299Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222

2025-04-14T10:49:09.305Z info: [Crawler] Successfully resolved IP address, new address: http://172.83.0.17:9222/

2025-04-14T10:49:17.801Z info: [Crawler][2913] Will crawl "https://www.google.com" for link with id "ohywrfgs6c5l93scrd61t6hk"

2025-04-14T10:49:17.801Z info: [Crawler][2913] Attempting to determine the content-type for the url https://www.google.com

2025-04-14T10:49:22.801Z error: [Crawler][2913] Failed to determine the content-type for the url https://www.google.com: AbortError: The operation was aborted.

2025-04-14T10:49:35.272Z info: [Crawler][2913] Successfully navigated to "https://www.google.com". Waiting for the page to load ...

2025-04-14T10:49:37.079Z info: [Crawler][2913] Finished waiting for the page to load.

2025-04-14T10:49:37.095Z info: [Crawler][2913] Successfully fetched the page content.

2025-04-14T10:49:38.262Z info: [Crawler][2913] Finished capturing page content and a screenshot. FullPageScreenshot: true

2025-04-14T10:49:38.269Z info: [Crawler][2913] Will attempt to extract metadata from page ...

2025-04-14T10:49:39.257Z info: [Crawler][2913] Will attempt to extract readable content ...

2025-04-14T10:49:40.088Z info: [Crawler][2913] Done extracting readable content.

2025-04-14T10:49:40.268Z info: [Crawler][2913] Stored the screenshot as assetId: 70e73c50-d5b5-4a92-a168-97589ed3d483

2025-04-14T10:51:48.809Z info: [Crawler][2913] Will crawl "https://www.google.com" for link with id "ohywrfgs6c5l93scrd61t6hk"

2025-04-14T10:51:48.809Z info: [Crawler][2913] Attempting to determine the content-type for the url https://www.google.com

2025-04-14T10:51:48.820Z error: [Crawler][2913] Crawling job failed: Error: Timed-out after 150 secs

Error: Timed-out after 150 secs

    at Timeout._onTimeout (/app/apps/workers/utils.ts:2:1025)

    at listOnTimeout (node:internal/timers:594:17)

    at process.processTimers (node:internal/timers:529:7)

My env variable:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions