top of page

How to Bypass Cloudflare When Web Scraping (curl_cffi Guide) [2026]

  • Writer: tony56024
    tony56024
  • Oct 17, 2025
  • 12 min read

Updated: Jun 4

curl_cffi Cloudflare bypass - web scraping without getting blocked (2026)


If your Python scrapers keep running into 403s, CAPTCHA walls, or that maddening situation where requests just quietly get throttled, here's the thing most people miss: it's usually not your code, and it's often not even your IP. It's your TLS fingerprint. The requests library basically announces itself as Python the second it opens a connection, well before the server has read a single one of your headers.


curl_cffi fixes that at the exact layer where the detection happens. In this guide we'll walk through what it does, what it can't do, and how we use it in production, with code you can copy and run as-is.


One thing we want to be straight about up front, because a lot of articles aren't: curl_cffi beats TLS and network-layer detection, and that's the gate on most e-commerce and data sites. What it doesn't do is run JavaScript. So on its own, it won't clear Cloudflare Turnstile or those "Checking your browser…" challenges. We'll come back to that.


Why requests Gets Blocked


For years, requests was the obvious choice: stable, readable, easy to teach. We still reach for it constantly. But the web kept evolving and the library, by design, mostly didn't.

When requests opens an HTTPS connection, the TLS handshake it produces looks nothing like a browser's. And anti-bot systems are reading four things long before your headers even come into play:


  • TLS fingerprint (JA3/JA4): requests sends a static OpenSSL handshake that Cloudflare has seen millions of times, so it gets recognized instantly. (JA3 is the original TLS-fingerprinting standard, open-sourced by Salesforce in 2017; JA4 is its successor from FoxIO. Both are what anti-bot systems compute on your handshake, and Cloudflare documents using them directly in Bot Management.)

  • HTTP version: requests only speaks HTTP/1.1, while real Chrome negotiates HTTP/2. That mismatch alone is enough to flag you.

  • Header order and casing: browsers send headers in a specific, consistent order, and requests doesn't match it.

  • ALPN negotiation: the protocol-negotiation sequence isn't what any real browser would send.


You can spoof the User-Agent all day, but the handshake underneath still says "Python." It's a disguise with the wrong voice. (For the wider picture of how sites detect and block automated traffic, see our guide on how to bypass anti-scraping tools.)



What curl_cffi Actually Is


curl_cffi is a Python HTTP client built on a fork of curl-impersonate, wired up through CFFI (a fast bridge between C and Python). Rather than just patching headers, it swaps out the TLS stack so the handshake itself matches a real browser, right down to cipher suites, extension ordering, GREASE values, supported curves, and HTTP/2 settings.

So when you tell it to impersonate Chrome, your script sends the exact encrypted handshake Chrome would. From Cloudflare's Bot Management or Akamai's point of view, the connection reads as a real browser session at the network layer.


Two things, in our experience, make it practical and not just a neat trick:

  1. The API mirrors requests. Porting an existing scraper is usually a one-line change to your import. (If you're newer to the ecosystem, our web scraping in Python guide covers the fundamentals curl_cffi slots into.)

  2. It's fast. Sitting on libcurl's C core, it holds its own against aiohttp and leaves pure-Python clients behind once you're doing serious volume.


Quick Start


Install it first (heads-up: the package name uses a hyphen, the import uses an underscore, which is easy to trip over):

pip install curl-cffi

Here's the smallest Cloudflare-safe request you can make:

from curl_cffi import requests

resp = requests.get("https://www.amazon.com/", impersonate="chrome")
print(resp.status_code)
print(resp.http_version)   # HTTP/2, a browser-like negotiation

A small but important habit: use impersonate="chrome", not a pinned version like chrome124. The generic alias always resolves to the newest profile your installed version supports, so your scraper doesn't quietly rot as browsers move on. We've seen plenty of "it used to work" tickets that traced back to nothing more than a stale pinned profile.

Now compare that to plain requests, which on the same URL usually gets the door slammed:

import requests

resp = requests.get("https://www.amazon.com/")
print(resp.status_code)   # commonly 403, or a 200 that's actually a bot page

And while we're here, that last comment matters. A 200 does not mean you won. Plenty of sites hand back a 200 with a CAPTCHA or a "verify you're human" page sitting in the body. Always look at what came back, not just the status code.


A Realistic Production Request


In an actual pipeline you'll want a persistent session (so cookies and connections get reused), proxies, and headers that look the part:

from curl_cffi import requests

session = requests.Session(impersonate="chrome")

headers = {
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.google.com/",
}

proxies = {
    "http": "http://user:pass@proxy-host:8080",
    "https": "http://user:pass@proxy-host:8080",
}

resp = session.get("https://httpbin.org/anything",
                   headers=headers, proxies=proxies)
print(resp.json())

Since the session negotiates the way Chrome does, the connection holds up as authentic end to end. Not just on request one, but across the whole cookie-backed session.

And here are a few things the beginner tutorials tend to skip, which we've learned the hard way:


  • Sessions aren't thread-safe. Give each thread its own session, or move to AsyncSession (below). Share one session across threads and you'll get the kind of intermittent failures that eat an afternoon.

  • Actually verify the fingerprint landed. In staging, route traffic through a local mitmproxy and log the JA4 hash of your outbound connections, then confirm it matches the browser you're claiming before you ship. This is how you catch "profile drift," where a target tightens its detection so your once-good profile silently stops matching, before it starts quietly degrading your data.

  • Pin the library version and audit it quarterly. Browser TLS parameters shift with each major release, so lock curl-cffi in your dependency file and revisit it when a new Chrome stable lands.


Async Scraping for High Volume


For I/O-bound pipelines hammering hundreds of URLs, async beats threading, and it sidesteps the thread-safety headache entirely:


import asyncio
from curl_cffi.requests import AsyncSession

async def fetch(session, url):
    resp = await session.get(url, impersonate="chrome")
    return resp.status_code, len(resp.text)

async def main(urls):
    async with AsyncSession() as session:
        tasks = [fetch(session, u) for u in urls]
        return await asyncio.gather(*tasks)

urls = ["https://httpbin.org/anything"] * 20
print(asyncio.run(main(urls)))

Add async support with pip install "curl-cffi[asyncio]". The async session carries the exact same browser-grade fingerprint as the sync one, and the async with block makes sure connections get closed cleanly. (The official asyncio quick-start is the canonical reference if you want to go deeper.)


requests vs curl_cffi



There's really one row that decides everything here, and it's the last technical one: neither library runs JavaScript. That single fact draws the line around what curl_cffi can and can't do for you.


How curl_cffi Compares to Other HTTP Clients


requests isn't your only option, and curl_cffi isn't the only client that can fake a fingerprint. So here's how the field actually shakes out in 2026. We've split it into clients that can impersonate TLS fingerprints and the standard ones that can't.



A few honest takeaways from working with these:


  • httpx and aiohttp are great clients that just don't help here. They're modern and quick, but they still hand over a Python TLS fingerprint, so against a Cloudflare TLS gate they fall over for the same reason requests does. Save them for APIs and unprotected targets, not anti-bot work.

  • primp is genuinely faster if throughput is your bottleneck, and it's worth benchmarking on your own workload. It's also picked up an AsyncClient and, unusually, lets you choose the impersonated OS independently of the browser. We dig into it just below.

  • tls-client and CycleTLS lean Go/Node. Both are solid, but calling them from a Python codebase means a binding layer and the friction that comes with it. If you're already in Python, that friction rarely earns its keep over curl_cffi.

  • hrequests is the interesting one because it can fall back to a real browser, which starts to chip away at the JS-challenge gap we'll get to, though you pay for it with a heavier dependency.


For most Python teams, curl_cffi lands in the sweet spot: maturity, broad browser coverage, HTTP/2 and HTTP/3, async, and an API that's basically a drop-in. primp is the one we'd A/B test against it when performance really counts.


And one caveat that covers every row in the top half of that table: TLS impersonation is the foundation, not the whole building. Even a flawless Chrome fingerprint won't get you past a JavaScript challenge, which is precisely where curl_cffi and its alternatives all hit the wall.


The Alternative Worth Knowing: primp


Most curl_cffi write-ups wave vaguely at "other libraries exist" and move on. We think primp deserves a real look, because it's the one mainstream alternative that goes toe-to-toe with curl_cffi on its own turf, and it has a couple of tricks curl_cffi simply doesn't.

primp ("Python Requests IMPersonate") is a Python binding to the Rust rquest library. It impersonates the same set of things, TLS/JA3/JA4 and HTTP/2 fingerprints, but it's built for raw speed, and it bills itself as the fastest impersonating client in Python.

The basic usage is clean:

import primp

client = primp.Client(impersonate="chrome_146", impersonate_os="windows")
resp = client.get("https://tls.peet.ws/api/all")
print(resp.status_code)

It also has an async client:

import asyncio
import primp

async def main():
    async with primp.AsyncClient(impersonate="chrome_146") as client:
        resp = await client.get("https://tls.peet.ws/api/all")
        print(resp.status_code)

asyncio.run(main())

Two things it genuinely does better than curl_cffi:


  1. Independent OS selection. In curl_cffi the operating system is baked into each browser profile, so you get whatever OS that profile ships with, take it or leave it. primp lets you set impersonate_os (windows, macos, linux, android, ios, or random) separately from the browser. That extra knob matters when a target cross-checks the OS implied by your TLS fingerprint against your User-Agent and Client Hints.

  2. Speed. Being Rust under the hood, it's tuned for throughput and tends to come out on top in head-to-head benchmarks.


Where curl_cffi still wins for most teams: it's more mature, has a bigger community and far more documentation, ships HTTP/3 fingerprints, and its API is closer to a true requests drop-in. primp's API is requests-flavored but not identical, since you spin up a Client rather than calling module-level functions the same way.


So, bottom line: make curl_cffi your default. Reach for primp when you're throughput-bound at serious volume, or when you specifically need to split the impersonated OS from the browser profile, which is a genuine edge case curl_cffi can't handle today.


Where curl_cffi Stops: JS Challenges


curl_cffi handles the network layer, full stop. What it can't touch is anything that runs code inside the browser. So if you hit a "Checking your browser…" screen, or you notice a cf_clearance cookie getting set after a short wait, that page is running a JavaScript challenge (Cloudflare's IUAM or Turnstile), and TLS impersonation on its own won't get you through.


When that happens, you've basically got three options:

  1. Go hybrid. Let a real browser (Playwright, Nodriver) solve the challenge once, grab the cf_clearance cookie, then hand it to curl_cffi for all the fast follow-up requests. You pay the browser tax once instead of on every call. (This is the path you take when a page needs full rendering, the same approach in our walkthrough on scraping a dynamic website with Python.)

  2. Use a solver. Services like CapSolver or 2Captcha can handle the challenge token programmatically.

  3. Offload it entirely. A managed unblocking API takes fingerprinting, challenges, and proxies off your plate completely.


For the big chunk of sites protected only by TLS fingerprinting, meaning most e-commerce catalogs, pricing pages, and listing data, curl_cffi on its own gets the job done.


Choosing a Browser Profile


A few rules of thumb that have saved us a lot of debugging:


  • Default to impersonate="chrome" (or "safari"). The aliases follow the latest supported fingerprints on their own, so you don't have to think about it.

  • If you must pin a version, pin a recent one. Profiles run from ancient (chrome99) to current, and the newest ones track the latest stable releases. An old pinned profile in 2026 is, ironically, its own detection signal.

  • Watch for version gaps. curl_cffi only adds a new profile when the fingerprint actually changes, so if a version number looks "missing," its fingerprint just didn't change in any meaningful way. Reach for the nearest available one and match the headers. (The official list of supported impersonate targets is the source of truth here.)


Still Getting Blocked? A Checklist


If your fingerprint is right and you're still getting blocked, work down this list. It's roughly the order we troubleshoot in:


  1. Datacenter IPs. Cloudflare scores datacenter ranges as low-trust regardless of how perfect your fingerprint is. Switch to residential proxies. This is the single most common fix. (Our guide to using proxies for web scraping breaks down the residential-versus-datacenter trade-off.)

  2. Stale profile. Update the library (pip install -U curl-cffi) and use the generic chrome alias.

  3. Request rate. Real users don't load 100 pages in two seconds. Add randomized 1 to 3 second delays.

  4. Thin headers. Add Accept-Language, Accept-Encoding, and a plausible Referer.

  5. Session/IP mismatch. Reusing one session across rotating IPs is inconsistent and flaggable. Keep one session per IP.

  6. It's a JS challenge. If none of the above helps and you see a challenge page, you've hit the JS wall, so pair with a browser or solver as above.



When curl_cffi Is the Right Tool


Reach for it when you're running into 403s or CAPTCHAs on TLS-fingerprinting sites, when you want HTTP/2 performance at volume, or when you're running scraping APIs that have to keep their success rates up. For lightweight public data or a simple API, honestly, plain requests is still fine, with no need to drag in the extra machinery.


The bigger picture: reliable large-scale scraping comes from browser-accurate networking, not from brute-force retries or firing up a headless browser for every single page.


curl_cffi hands you most of a browser's stealth at a tiny fraction of what Playwright or Puppeteer cost in resources, and you keep the heavy tools in reserve for the genuinely JS-gated pages that actually need them. (If you're still mapping out your stack, our roundup of web scraping tools and our walkthrough on building a web crawler from scratch are good companions to this piece.)


FAQ

  1. Is curl_cffi a drop-in replacement for requests? 

    Pretty much. Swap import requests for from curl_cffi import requests, add impersonate="chrome" to your calls, and most of your existing code keeps running untouched.

  2. Does it bypass Cloudflare Turnstile? .

    No. Turnstile is a JavaScript challenge and curl_cffi has no JS engine. Use it for TLS-gated sites, and pair it with a browser or solver when you hit a JS challenge.

  3. Which profile should I use? 

    impersonate="chrome". It resolves to the latest supported fingerprint on its own, so you're never tracking version numbers or accidentally pinning something stale.

  4. Why am I still blocked even though I'm impersonating? 

    Usually, in this order: datacenter IPs, too high a request rate, thin headers, a stale library, or a JavaScript challenge that the network layer just can't clear. Run the checklist above.

  5. Is it legal to use? 

    The library itself is open-source and perfectly legal. Whether a given scrape is lawful comes down to the site's Terms of Service, its robots.txt, and the laws that apply to you (including GDPR if personal data is involved), not the tool you picked. Check robots.txt, respect crawl delays, and stay away from anything behind a login. (We cover this in more depth in is web scraping legal?)


Conclusion: When to Build, and When to Hand It Off


curl_cffi is the right first move against TLS-layer detection. It's mature, it's fast, it's close to a drop-in for requests, and for the big share of sites that are gated only by fingerprinting, it's often everything you need. Pair it with residential proxies, sane request pacing, and the generic chrome profile, and you'll clear most of what currently blocks requests.


But we'd rather be honest about what comes after that first win. A reliable scraper was never really one library. It's an ongoing operation. You're rotating and vetting proxy pools, keeping fingerprint profiles current as browsers ship new versions, handling the JavaScript challenges curl_cffi can't, watching for silent fingerprint drift, and babysitting the ban-and-retry logic the hard targets force on you. Any one of those is manageable.


Stacked together, at scale, they quietly turn into a full-time engineering job, and that maintenance load tends to grow faster than the data you're actually collecting.

That's really the decision hiding behind the tooling. If scraping is your product, building all of this in-house makes sense. If you just need the data and not the infrastructure headache, the math usually points the other way.


That second path is what we do at Datahut. We run managed, fully compliant web scraping as a service. We handle the fingerprinting, the proxy management, the anti-bot challenges, and the pipeline reliability, so your team ends up with clean, structured data on a schedule instead of a growing pile of broken scrapers.


Whether you're tracking competitor pricing, keeping an eye on product assortment and stock, or building a data feed across hundreds of protected sites, we deliver the output and absorb the upkeep.


So if your scrapers are spending more time getting unblocked than actually collecting data, come talk to us at Datahut. We'll scope it with you and tell you honestly whether managed extraction is the right call for your case.


References & Further Reading


All primary sources, meaning official project documentation and the original creators of the underlying standards, not third-party vendors:



For background: JA3 was invented at Salesforce in 2017, though the project is no longer actively maintained there; its original creator, John Althouse, now maintains the newer fingerprinting work at FoxIO. JA4 TLS client fingerprinting is open-source under the BSD 3-Clause licence, the same as JA3, with no patent claims, so any tool using JA3 can move to JA4 freely.




Do you want to offload the dull, complex, and labour-intensive web scraping task to an expert?

bottom of page