httpx transport for curl_cffi - python binding for curl-impersonate fork
Unlike other pure python http clients
like httpx
(with native transport) or requests
,
curl_cffi
can impersonate browser's TLS/JA3 and HTTP/2 fingerprints.
Browser simulation implemented by low-level customizations
and usage of native browser TLS libraries
(BoringSSL
for Chrome, nss
for Firefox) -
which is impossible to achieve with Python's OpenSSL
binding.
If you are blocked by some website for no obvious reason,
you can give curl_cffi
a try.
pip install httpx-curl-cffi
from httpx import Client, AsyncClient
from httpx_curl_cffi import CurlTransport, AsyncCurlTransport, CurlOpt
client = Client(transport=CurlTransport(impersonate="chrome", default_headers=True))
client.get("https://tools.scrapfly.io/api/fp/ja3")
async_client = AsyncClient(transport=AsyncCurlTransport(
impersonate="chrome",
default_headers=True,
# required for parallel requests, see curl_cffi issues below
curl_options={CurlOpt.FRESH_CONNECT: True}
))
Note that httpx.Client
and httpx.AsyncClient
disables proxy configuration from environment variables
on providing transport
argument even with trust_env=True
(default),
to have this configured in the same way as native transport use snippet below:
import httpx
from httpx._utils import get_environment_proxies
from httpx_curl_cffi import CurlTransport # or AsyncCurlTransport
def transport_factory(proxy: httpx.Proxy | None = None) -> httpx.BaseTransport:
return CurlTransport( # or AsyncCurlTransport
proxy=proxy,
verify=False, # and other custom options
)
client = httpx.Client( # or httpx.AsyncClient
transport=transport_factory(),
mounts={
k: transport_factory(httpx.Proxy(url=v)) if v else None
for k, v in get_environment_proxies().items()
}
)
httpx.Request
content completely read in memory before sending, not sure if it's fixable withcurl_cffi
at allCurlTransport.cert
argument should support in-memory data instead of filenames,pathlib.Path
(instead of strings inhttpx._types.CertTypes
) is forced
httpx.Timeout.pool
is ignored, should be implemented incurl_cffi
- Simultaneous asynchronous requests requires to set
CurlTransport.curl_options={CurlOpt.FRESH_CONNECT: True}
lexiforest/curl_cffi#302 lexiforest/curl_cffi#319
-
httpx.Timeout.write
is ignored (libcurl
limitation) -
CurlTransport.verify
asssl.SSLContext
isn't supported (becauseOpenSSL
is not used) -
CurlTransport.trust_env
argument is ignored,libcurl
is always using environment variables for configuration, which is disabled for proxies usingCurlOpt.NOPROXY
setting to makeproxy
argument have complete control on proxy usage, but may have effect in TLS configuration (but may not be used bycurl-impersonate
fork, idk) lexiforest/curl_cffi#345 -
httpx.Response.request.headers
isn't updated with defaultcurl-impersonate
headers, which can be unexpected onCurlTransport.default_headers=True
lexiforest/curl_cffi#368 -
CurlTransport.cert
argument isn't compatible with (deprecated)httpx._types.CertTypes
- impossible to pass password as third tuple element,pathlib.Path
(instead of strings inhttpx._types.CertTypes
) is forced