How do I use Cheerio to parse HTML in Node.js?
Cheerio is a fast, jQuery-like HTML parser for Node.js that makes extracting data from HTML simple and intuitive.
Installation:
npm install cheerio axios
Basic usage:
import axios from 'axios';
import * as cheerio from 'cheerio';
const { data } = await axios.get('https://example.com');
const $ = cheerio.load(data);
// Now use $ like jQuery
const title = $('h1').text();
Selecting elements:
Use CSS selectors:
$('h1')- by tag$('.classname')- by class$('#id')- by ID$('div > p')- child combinator$('a[href]')- attribute selector$('li:first-child')- pseudo-selectors
Extracting data:
.text()- Get text content (strips HTML tags).html()- Get inner HTML.attr('name')- Get attribute values (like href, src).val()- For form input values.data('key')- For data attributes
Iterating over multiple elements:
// Using .each()
$('a').each((i, el) => {
const href = $(el).attr('href');
console.log(href);
});
// Using .map()
const links = $('a')
.map((i, el) => $(el).attr('href'))
.toArray();
Traversing the DOM:
.find('selector')- Search descendants.parent()- Get the parent element.children()- Get direct children.siblings()- Get sibling elements.next()and.prev()- Adjacent siblings
Common patterns:
Extract all links:
const links = $('a[href]')
.map((i, el) => $(el).attr('href'))
.toArray();
Extract table data:
const rows = [];
$('table tr').each((i, row) => {
const cells = $(row).find('td')
.map((j, cell) => $(cell).text())
.toArray();
rows.push(cells);
});
Extract structured data:
const products = [];
$('.product').each((i, el) => {
products.push({
name: $(el).find('.name').text(),
price: $(el).find('.price').text(),
image: $(el).find('img').attr('src')
});
});
Limitations:
- Cheerio only parses static HTML - it doesn't execute JavaScript
- Content loaded dynamically won't be available
- For JavaScript-heavy sites, use Puppeteer instead
- Cheerio doesn't make HTTP requests itself - you need Axios or another HTTP client to fetch the HTML first
Performance tips:
- Reuse the loaded Cheerio instance
- Use specific selectors instead of broad ones
- Avoid unnecessary DOM traversal