When should I set the Referer header in web scraping?

The Referer header tells the server which page sent you to the current page. Setting it correctly is important for some sites.

What it contains:

URL of the previous page:

Referer: https://example.com/previous-page

When it matters:

Following links: When scraping page sequences:

# Scraping product from category page
referer = 'https://example.com/category/electronics'
headers = {'Referer': referer}
response = requests.get('https://example.com/product/123', headers=headers)

Form submissions: Forms often check if Referer matches form page:

# Submitting login form
response = requests.post(
    'https://example.com/login',
    data=credentials,
    headers={'Referer': 'https://example.com/login-page'}
)

Anti-hotlinking: Some sites require Referer to serve images/assets:

# Downloading image
headers = {'Referer': 'https://example.com/page-with-image'}
response = requests.get(image_url, headers=headers)

Session tracking: Some sites use Referer to track navigation flow and detect bots.

When to omit:

Direct navigation: If simulating typing URL directly:

# No Referer for direct access
response = requests.get(url)  # Omit Referer header

Privacy settings: Some browsers let users disable Referer.

Cross-origin restrictions: Some sites/browsers limit Referer for cross-origin requests.

Implementation strategies:

Track navigation:

class Scraper:
    def __init__(self):
        self.last_url = None

    def fetch(self, url):
        headers = {}
        if self.last_url:
            headers['Referer'] = self.last_url

        response = requests.get(url, headers=headers)
        self.last_url = url
        return response

For search → product flow:

# 1. Search page
search_url = 'https://example.com/search?q=laptop'
search_page = requests.get(search_url)

# 2. Product page (set Referer to search)
product_url = 'https://example.com/product/123'
product_page = requests.get(
    product_url,
    headers={'Referer': search_url}
)

Scrapy (automatic):

Scrapy adds Referer automatically when following links:

def parse(self, response):
    # Scrapy automatically sets Referer for followed links
    yield response.follow(product_url, callback=self.parse_product)

Security consideration:

Don't send sensitive data in Referer:

  • Tokens in URL query strings
  • Session IDs
  • Private identifiers

Referer vs Referrer:

The HTTP header is misspelled as "Referer" (one 'r'). Always use Referer in code.

Recommendation:

  • Set Referer when navigating through pages sequentially
  • Use the actual previous page URL
  • Omit for direct access (bookmarks, address bar)
  • Track navigation flow to set it correctly
  • Essential for form submissions and asset downloads

Many sites check Referer to prevent direct access or hotlinking.

Related Questions