How do I extract JavaScript-generated links?
JavaScript-generated links are created dynamically by client-side code after the initial HTML loads, making them invisible to simple HTML parsers that only see static content.
Modern single-page applications (SPAs) built with React, Vue, Angular, or vanilla JavaScript often render most or all links via JavaScript.
Headless browser approach (most reliable):
Use tools like Puppeteer, Playwright, or Selenium that run a real browser engine and execute JavaScript:
- Navigate to the page with
page.goto() - Wait for content to load with
page.waitForSelector()orpage.waitForLoadState() - Optionally interact with the page (scroll, click buttons) to trigger more links
- Extract links using
page.$$eval('a', links => links.map(a => a.href))$$
This approach works for all JavaScript patterns but is slower and more resource-intensive.
Static analysis approach (faster but limited):
- Examine the JavaScript code to understand how links are generated
- Look for URL patterns in JavaScript files or inline
<script>tags - Find data attributes that might contain URLs (like
data-url,data-href) - Check for frameworks' routing configurations (React Router, Vue Router)
- Extract hardcoded URL strings
This requires understanding the site's structure but is much faster.
Hybrid approach:
- Load the page with a headless browser
- Wait for initial rendering
- Inject JavaScript to extract links
- Trigger any pagination or "load more" buttons
Common patterns:
- Single-page apps with client-side routing (watch for
pushStateor hash changes) - Infinite scrolling (need to scroll to trigger loading)
- Lazy-loaded content (wait for intersection observers)
- Click-to-reveal links (simulate clicks on dropdown menus or expandable sections)
Our Link Extractor works on static HTML; for JavaScript-heavy sites, we recommend using headless browsers or browser DevTools to capture the rendered DOM.