What is the best way to convert HTML tables to CSV?

Converting HTML tables to CSV depends on your technical skill level and requirements.

Non-coding approach (fastest):

Use a table extractor tool:

  1. Upload your HTML file
  2. Preview extracted tables
  3. Download as CSV directly

Python with pandas (recommended):

import pandas as pd

# Extract all tables
tables = pd.read_html('page.html')

# Export first table to CSV
tables[0].to_csv('output.csv', index=False)

# Export with custom settings
tables[0].to_csv('output.csv',
                  index=False,
                  encoding='utf-8',
                  sep=',',
                  quoting=csv.QUOTE_MINIMAL)

JavaScript (Node.js):

const fs = require('fs');
const cheerio = require('cheerio');
const { parse } = require('json2csv');

const html = fs.readFileSync('page.html', 'utf8');
const $ = cheerio.load(html);

const data = [];
$('table tr').each((i, row) => {
  const rowData = {};
  $(row).find('td').each((j, cell) => {
    rowData[`col${j}`] = $(cell).text().trim();
  });
  if (Object.keys(rowData).length) data.push(rowData);
});

const csv = parse(data);
fs.writeFileSync('output.csv', csv);

Handling common issues:

  • Encoding: Always use UTF-8 for special characters
  • Delimiter conflicts: Use tab or semicolon if data contains commas
  • Quotes: Wrap fields containing commas or newlines in quotes
  • Headers: Extract from <th> tags or first row

Advanced considerations:

  • Large tables: Process in chunks to avoid memory issues
  • Multiple tables: Export each to separate CSV or combine with identifiers
  • Data cleaning: Remove HTML entities, normalize whitespace
  • Type preservation: CSV loses type information (everything becomes text)

When to use CSV vs JSON:

CSV advantages:

  • Smaller file size
  • Direct import to Excel/Sheets
  • Better for tabular data

JSON advantages:

  • Preserves data types
  • Better for nested structures
  • Easier to work with in code

Related Questions