What image formats can be extracted from web pages?

Web pages typically contain several image formats, each optimized for different purposes.

Common image formats:

  • JPEG/JPG - The most common format for photographs and complex images with gradients, offering good compression with some quality loss
  • PNG - Widely used for graphics, logos, and images requiring transparency, providing lossless compression
  • GIF - Supports simple animations and transparent backgrounds but is limited to 256 colors
  • WebP - A modern format from Google offering superior compression and quality, supporting both lossy and lossless compression plus transparency and animation
  • SVG (Scalable Vector Graphics) - XML-based and perfect for logos and icons that need to scale without quality loss
  • AVIF - An emerging format with even better compression than WebP, though browser support is still growing
  • ICO - Used for favicons and small icons

Format detection:

When scraping images, check the Content-Type response header to identify the actual format, as file extensions can be misleading or missing. Some sites serve different formats based on browser capabilities using the <picture> tag with multiple <source> elements.

Modern sites often provide WebP or AVIF to supporting browsers while falling back to JPEG/PNG for older browsers. Our Image Extractor identifies all these formats and allows filtering by file type.

Related Questions