How do I scrape dynamic content loaded with JavaScript in Python?
Dynamic JavaScript content requires different approaches than static HTML scraping.
Strategy 1: Find the API endpoint (best):
Often, JavaScript-rendered content comes from API calls:
- Open browser DevTools Network tab
- Reload the page and filter for XHR/Fetch requests
- Find the API endpoint returning JSON data
- Make direct requests to the API with
requests
This is faster and more reliable than browser automation.
Strategy 2: Use Selenium for browser automation:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get(url)
# Wait for dynamic content to load
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "dynamic-content"))
)
html = driver.page_source
driver.quit()
Strategy 3: Use Playwright (modern alternative):
Playwright is faster and more reliable than Selenium:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto(url)
page.wait_for_selector('.dynamic-content')
html = page.content()
browser.close()
When to use each approach:
- API endpoint: Always try this first - fastest and most reliable
- Selenium: When you need to interact (click buttons, fill forms)
- Playwright: Better than Selenium for most modern sites
Performance impact:
Browser automation is 10-100x slower than direct HTTP requests. Always prefer API endpoints when available.