How to fix “403 Forbidden” errors when calling APIs using Python requests?
Bypassing 403 Forbidden Errors in Web Scraping: A Step-by-Step Guide Without Selenium
Introduction
Web scraping can feel like navigating a minefield when servers block your requests with 403 Forbidden errors. These errors often occur because websites detect non-browser traffic (like scripts) through mechanisms like TLS fingerprinting, header validation, or IP blocking. While tools like Selenium mimic browsers, they’re resource-heavy. In this guide, I’ll share multiple proven techniques to bypass 403 errors using Python, including a hidden gem: curl_cffi
.
The Problem: 403 Forbidden Hell
While trying to scrap some data from a website, my Python script using the popular requests
library kept hitting a brick wall:
import requests
response = requests.get(url, headers=perfect_headers) # Always returns 403!
Despite:
- Perfectly replicated headers (via MITMproxy)
- Matching cookies
- Correct user-agent
- Proper TLS configuration
The server kept rejecting my requests with 403 Forbidden errors. Why?
Why 403 Errors Happen
- Missing/Invalid Headers: Servers check for browser-like headers (e.g.,
sec-ch-ua
,user-agent
). - TLS/JA3 Fingerprinting: Servers detect non-browser TLS handshakes.
- IP Rate Limiting: Too many requests from the same IP.
- Path/Protocol Validation: URLs or HTTP versions may trigger suspicion.
in my case it was The Culprit: TLS Fingerprinting
Modern websites don’t just check headers — they analyze your TLS handshake fingerprint (JA3). Libraries like requests
and urllib
have distinct fingerprints that scream "BOT!" to servers.
The Solution
Use curl_cffi
to Impersonate Browser TLS Fingerprints
The curl_cffi
library mimics browser TLS fingerprints, bypassing JA3 detection. it combines cURL’s power with browser-like TLS fingerprints.
1. Installation
pip install curl_cffi
2. The Magic Code
# Install: pip install curl_cffi
from curl_cffi import requests
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
"accept": "*/*",
"referer": "https://example.com"
}
response = requests.get(
"https://example.com/",
headers=headers,
impersonate="chrome110" # Mimics Chrome 110 TLS
)
Impersonation Targets:
# Available options
impersonate="chrome110"
impersonate="chrome120"
impersonate="safari16"
Key Differentiators
impersonate
parameter specifying Chrome 110- No SSL verification needed
- Automatic handling of HTTP/2 and brotli encoding
Why This Works
- Spoofs Chrome’s TLS fingerprint, making the request appear browser-like.
- Avoids the need for Selenium or headless browsers.
Tips:
- Add random delays between requests
- Rotate user-agent strings
- Use proxy rotation
Other Solutions to try
1. Refine Headers to Match Browser Requests
Capture headers from a real browser (using Chrome DevTools or mitmproxy) and include all critical headers like:
sec-ch-ua
,sec-fetch-*
,referer
,origin
Example:
headers = {
"sec-ch-ua": '"Google Chrome";v="131", "Chromium";v="131", "Not-A Brand";v="24"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"sec-fetch-site": "same-origin",
"sec-fetch-mode": "cors",
"referer": "https://example.com/",
"priority": "u=1, i"
}
Tip: Simplify headers if they conflict (e.g., use accept: */*
instead of complex values).
2. Use Sessions and Rotate User-Agents
Persist cookies and rotate headers with requests.Session
:
import requests
from fake_useragent import UserAgent
session = requests.Session()
ua = UserAgent()
headers = {
"user-agent": ua.chrome,
"accept-language": "en-US,en;q=0.9"
}
session.headers.update(headers)
response = session.get("https://example.com/")
3. Spoof HTTP/2 with httpx
Some sites require HTTP/2 support. Use httpx
for HTTP/2 compatibility:
# Install: pip install httpx
import httpx
with httpx.Client(http2=True, headers=headers) as client:
response = client.get("https://example.com/")
4. Bypass Path Validation
Modify the URL to trick path-based filters:
url = "https://example.com//" # Add trailing slashes
# OR
url = "https://example.com/?cache=1" # Add dummy params
5. Route Through Proxies
Rotate IPs to avoid blocks:
proxies = {
"http": "http://user:pass@proxy_ip:port",
"https": "http://user:pass@proxy_ip:port"
}
response = requests.get(url, headers=headers, proxies=proxies)
Free Proxies: Use services like FreeProxyList, but expect instability.
6. Disable SSL Verification (Last Resort)
If the site blocks non-browser SSL handshakes:
response = requests.get(url, headers=headers, verify=False) # Use with caution!
Conclusion
Bypassing 403 errors requires mimicking browsers at multiple levels: headers, TLS fingerprints, and request patterns. While curl_cffi
is a game-changer, combining it with header refinement, HTTP/2, and proxies ensures robust scraping. Always respect robots.txt
and avoid overloading servers.
Got your own 403 horror story? Share your experiences in the comments!
⚠️ Disclaimer: This article is for educational purposes only. Always obtain proper authorization before scraping any website.