How do I avoid getting blocked when web scraping with Node.js?

Set proper headers:

Use realistic User-Agent strings from recent browsers
Include Accept, Accept-Language, Accept-Encoding headers
Add Referer header when navigating between pages
Maintain consistent header sets that match real browsers

Implement rate limiting:

Add delays between requests (500ms-2000ms)
Limit concurrent requests (3-5 simultaneous connections)
Use exponential backoff on errors
Respect Retry-After headers in 429 responses

Rotate IP addresses:

Use proxy services (residential proxies are best)
Rotate proxies between requests or sessions
Handle proxy failures gracefully with fallbacks
Monitor proxy health and performance

Use Puppeteer stealth techniques:

Install puppeteer-extra-plugin-stealth to hide automation indicators
Randomize viewport sizes and user agents
Simulate human-like mouse movements and delays
Handle CAPTCHAs with solving services when necessary

Respect robots.txt:

Parse and follow robots.txt rules
Check Crawl-delay directives
Avoid disallowed paths
Consider reaching out for API access or permission

Session management:

Maintain cookies across requests using cookie jars
Handle authentication properly
Don't create new sessions for every request unnecessarily
Store session state to resume after interruptions

Error handling and retries:

Implement exponential backoff (retry after 2s, 4s, 8s)
Distinguish between temporary (retry) and permanent (skip) errors
Log failures for analysis
Circuit-break to stop hitting a failing target

Behavioral patterns:

Scrape during off-peak hours
Vary request timing (don't be too regular)
Start slowly and increase rate gradually
Alternate between different pages/sections

Technical measures:

Use HTTP/2 when supported
Enable compression (gzip, brotli)
Handle redirects properly
Validate SSL certificates correctly

Monitoring:

Track error rates and response codes
Monitor for blocks (403, 429, CAPTCHAs)
Log response times to detect throttling
Adjust strategy based on patterns

Legal and ethical:

Review terms of service
Avoid scraping personal data without consent
Don't overload servers
Consider official APIs when available

Try our Node.js Web Scraping Cheatsheet →

Related Questions