What is the best way to convert HTML tables to CSV?
Converting HTML tables to CSV depends on your technical skill level and requirements.
Non-coding approach (fastest):
Use a table extractor tool:
- Upload your HTML file
- Preview extracted tables
- Download as CSV directly
Python with pandas (recommended):
import pandas as pd
# Extract all tables
tables = pd.read_html('page.html')
# Export first table to CSV
tables[0].to_csv('output.csv', index=False)
# Export with custom settings
tables[0].to_csv('output.csv',
index=False,
encoding='utf-8',
sep=',',
quoting=csv.QUOTE_MINIMAL)
JavaScript (Node.js):
const fs = require('fs');
const cheerio = require('cheerio');
const { parse } = require('json2csv');
const html = fs.readFileSync('page.html', 'utf8');
const $ = cheerio.load(html);
const data = [];
$('table tr').each((i, row) => {
const rowData = {};
$(row).find('td').each((j, cell) => {
rowData[`col${j}`] = $(cell).text().trim();
});
if (Object.keys(rowData).length) data.push(rowData);
});
const csv = parse(data);
fs.writeFileSync('output.csv', csv);
Handling common issues:
- Encoding: Always use UTF-8 for special characters
- Delimiter conflicts: Use tab or semicolon if data contains commas
- Quotes: Wrap fields containing commas or newlines in quotes
- Headers: Extract from
<th>tags or first row
Advanced considerations:
- Large tables: Process in chunks to avoid memory issues
- Multiple tables: Export each to separate CSV or combine with identifiers
- Data cleaning: Remove HTML entities, normalize whitespace
- Type preservation: CSV loses type information (everything becomes text)
When to use CSV vs JSON:
CSV advantages:
- Smaller file size
- Direct import to Excel/Sheets
- Better for tabular data
JSON advantages:
- Preserves data types
- Better for nested structures
- Easier to work with in code