How do I extract meta tags from HTML?

Extracting meta tags involves parsing the HTML <head> section and identifying all <meta> tags along with their attributes.

Tools to use:

Use an HTML parser appropriate for your language:

  • Cheerio (Node.js)
  • Beautiful Soup (Python)
  • Browser DevTools

Basic extraction:

  1. Select all <meta> tags using CSS selectors:
    • meta[name] for standard meta tags
    • meta[property] for Open Graph tags
  2. Extract attributes:
    • name or property (the meta tag identifier)
    • content (the meta tag value)
  3. Also extract <title> tags and relevant <link> tags (canonical, alternate languages, RSS feeds)

Organizing extracted data:

  • Group by category (SEO, Open Graph, Twitter Cards, technical tags)
  • Create key-value pairs where the name/property is the key and content is the value
  • Handle multiple tags with the same name (like multiple keywords tags)
  • Preserve order when it matters

Example code (Node.js/Cheerio):

import * as cheerio from 'cheerio';

const $ = cheerio.load(html);
const metaTags = {};

// Extract standard meta tags
$('meta[name]').each((i, el) => {
  const name = $(el).attr('name');
  const content = $(el).attr('content');
  metaTags[name] = content;
});

// Extract Open Graph tags
$('meta[property]').each((i, el) => {
  const property = $(el).attr('property');
  const content = $(el).attr('content');
  metaTags[property] = content;
});

// Extract title
metaTags.title = $('title').text();

Handling edge cases:

  • Some sites use non-standard meta tag attributes
  • Dynamically generate meta tags via JavaScript (requiring headless browsers)
  • Have malformed HTML with unclosed or nested meta tags
  • Use custom meta tags for internal purposes (like fb:app_id for Facebook)_

Common patterns:

  • Check for <link rel="canonical"> for the preferred URL
  • Look for <meta name="robots"> to understand indexing preferences
  • Find <meta name="author"> for content attribution
  • Extract structured data from JSON-LD script tags (though technically not meta tags)

Our Meta Tag Extractor automatically handles all these cases and presents meta tags in a structured, easy-to-read format.

Related Questions