Why do I need to set a User-Agent when web scraping?

Setting a proper User-Agent header is essential for successful web scraping.

Default behavior looks suspicious:

Most HTTP libraries send identifying User-Agent headers:

These immediately identify your traffic as automated, making it easy for websites to block.

Why websites check User-Agent:

Missing User-Agent consequences:

Many servers will:

Setting a realistic User-Agent:

In Python (requests):

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers)

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers)

Best practices:

A proper User-Agent is the most basic requirement for ethical and successful web scraping.

Related Questions