How do I use Cheerio to parse HTML in Node.js?

Cheerio is a fast, jQuery-like HTML parser for Node.js that makes extracting data from HTML simple and intuitive.

Installation:

npm install cheerio axios

Basic usage:

import axios from 'axios';
import * as cheerio from 'cheerio';

const { data } = await axios.get('https://example.com');
const $ = cheerio.load(data);

// Now use $ like jQuery
const title = $('h1').text();

Selecting elements:

Use CSS selectors:

  • $('h1') - by tag
  • $('.classname') - by class
  • $('#id') - by ID
  • $('div > p') - child combinator
  • $('a[href]') - attribute selector
  • $('li:first-child') - pseudo-selectors

Extracting data:

  • .text() - Get text content (strips HTML tags)
  • .html() - Get inner HTML
  • .attr('name') - Get attribute values (like href, src)
  • .val() - For form input values
  • .data('key') - For data attributes

Iterating over multiple elements:

// Using .each()
$('a').each((i, el) => {
  const href = $(el).attr('href');
  console.log(href);
});

// Using .map()
const links = $('a')
  .map((i, el) => $(el).attr('href'))
  .toArray();

Traversing the DOM:

  • .find('selector') - Search descendants
  • .parent() - Get the parent element
  • .children() - Get direct children
  • .siblings() - Get sibling elements
  • .next() and .prev() - Adjacent siblings

Common patterns:

Extract all links:

const links = $('a[href]')
  .map((i, el) => $(el).attr('href'))
  .toArray();

Extract table data:

const rows = [];
$('table tr').each((i, row) => {
  const cells = $(row).find('td')
    .map((j, cell) => $(cell).text())
    .toArray();
  rows.push(cells);
});

Extract structured data:

const products = [];
$('.product').each((i, el) => {
  products.push({
    name: $(el).find('.name').text(),
    price: $(el).find('.price').text(),
    image: $(el).find('img').attr('src')
  });
});

Limitations:

  • Cheerio only parses static HTML - it doesn't execute JavaScript
  • Content loaded dynamically won't be available
  • For JavaScript-heavy sites, use Puppeteer instead
  • Cheerio doesn't make HTTP requests itself - you need Axios or another HTTP client to fetch the HTML first

Performance tips:

  • Reuse the loaded Cheerio instance
  • Use specific selectors instead of broad ones
  • Avoid unnecessary DOM traversal

Related Questions