scrapy-curl-cffi

Scrapy integration with curl_cffi (curl-impersonate).

Installation

pip install scrapy-curl-cffi

Another option, to enable Scrapy's support for modern HTTP compression protocols:

pip install scrapy-curl-cffi[compression]

Configuration

Update your Scrapy project settings as follows:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
    "https": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
}

DOWNLOADER_MIDDLEWARES = {
    "scrapy_curl_cffi.middlewares.CurlCffiMiddleware": 200,
    "scrapy_curl_cffi.middlewares.DefaultHeadersMiddleware": 400,
    "scrapy_curl_cffi.middlewares.UserAgentMiddleware": 500,
    "scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware": None,
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

To download a scrapy.Request with curl_cffi, add the curl_cffi_options special key to the Request.meta attribute. The value should be a dict with any of the following options:

impersonate - which browser version to impersonate
ja3 - ja3 string to impersonate
akamai - akamai string to impersonate
extra_fp - extra fingerprints options, in complement to ja3 and akamai strings
default_headers - whether to set default browser headers when impersonating, defaults to True
verify - whether to verify https certs, defaults to False

See the curl_cffi documentation for more info on these options.

Alternatively, you can use the curl_cffi_options spider attribute or the CURL_CFFI_OPTIONS setting to automatically assign the curl_cffi_options meta for all requests.

Example spider

class FingerprintsSpider(scrapy.Spider):
    name = "fingerprints"
    start_urls = ["https://tls.browserleaks.com/json"]
    curl_cffi_options = {"impersonate": "chrome"}

    def parse(self, response):
        yield response.json()

curl_cffi interop

scrapy-curl-cffi strives to adhere to established Scrapy conventions, ensuring that most Scrapy settings, spider attributes, request/response attributes and meta keys configure the crawler's behavior in an expected manner.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
src/scrapy_curl_cffi		src/scrapy_curl_cffi
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scrapy-curl-cffi

Installation

Configuration

Usage

Example spider

curl_cffi interop

Similar projects

About

Uh oh!

Releases

Packages

Languages

divtiply/scrapy-curl-cffi

Folders and files

Latest commit

History

Repository files navigation

scrapy-curl-cffi

Installation

Configuration

Usage

Example spider

curl_cffi interop

Similar projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages