What image formats can be extracted from web pages?
Web pages typically contain several image formats, each optimized for different purposes.
Common image formats:
- JPEG/JPG - The most common format for photographs and complex images with gradients, offering good compression with some quality loss
- PNG - Widely used for graphics, logos, and images requiring transparency, providing lossless compression
- GIF - Supports simple animations and transparent backgrounds but is limited to 256 colors
- WebP - A modern format from Google offering superior compression and quality, supporting both lossy and lossless compression plus transparency and animation
- SVG (Scalable Vector Graphics) - XML-based and perfect for logos and icons that need to scale without quality loss
- AVIF - An emerging format with even better compression than WebP, though browser support is still growing
- ICO - Used for favicons and small icons
Format detection:
When scraping images, check the Content-Type response header to identify the actual format, as file extensions can be misleading or missing. Some sites serve different formats based on browser capabilities using the <picture> tag with multiple <source> elements.
Modern sites often provide WebP or AVIF to supporting browsers while falling back to JPEG/PNG for older browsers. Our Image Extractor identifies all these formats and allows filtering by file type.