How do I extract lazy-loaded images?

Lazy-loaded images pose a challenge for scraping because they don't load until the user scrolls to them, improving initial page load performance.

Instead of using the standard src attribute, lazy-loaded images typically use placeholder attributes like data-src, data-lazy, data-original, or data-srcset. The actual image URL is stored in these data attributes and only transferred to the src attribute when the image enters the viewport.

Extracting from static HTML:

Search for common data attributes: data-src, data-lazy-src, data-original, data-srcset, data-background, or custom attributes specific to lazy-loading libraries like Lazysizes or Lozad.

For JavaScript-heavy sites:

You'll need a headless browser like Puppeteer or Playwright that can execute JavaScript and trigger lazy loading by scrolling the page. Use page.evaluate() to scroll incrementally and page.waitForSelector() to wait for images to appear.

Advanced techniques:

  • Disabling lazy loading by intercepting requests and modifying JavaScript
  • Directly executing the lazy-loading trigger functions
  • Check the Network tab in browser DevTools while scrolling to see which requests load images and identify the triggering mechanism

Our Image Extractor analyzes both standard src attributes and common lazy-loading data attributes to capture all images.

Related Questions