What HTTP headers do I need for web scraping?

Essential HTTP headers make your scraper look like a legitimate browser and prevent blocks.

Minimum required headers:

User-Agent (most important): Identifies your browser/client:

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

Without this, many sites block requests immediately.

Accept: Tells the server what content types you can handle:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Language: Preferred language for responses:

Accept-Language: en-US,en;q=0.9

Accept-Encoding: Compression formats you support:

Accept-Encoding: gzip, deflate, br

Additional important headers:

Referer: Previous page URL (important for navigation):

Referer: https://example.com/previous-page

Connection:

Connection: keep-alive

Upgrade-Insecure-Requests:

Upgrade-Insecure-Requests: 1

Complete Python example:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
}

response = requests.get(url, headers=headers)

Headers to avoid:

Don't send headers that reveal automation:

  • X-Automated-Tool
  • Bot, Scraper in User-Agent
  • Mismatched header combinations

When to add more headers:

For tougher sites, add:

  • Sec-Fetch-* headers (Chrome-specific)
  • DNT (Do Not Track)
  • Cache-Control*

Best practice:

Use a header generator to get realistic, matched header sets for your target browser. Mismatched headers (e.g., Safari User-Agent with Chrome-specific headers) can trigger detection.

Related Questions