How do I convert HTML tables to JSON format?
Converting HTML tables to JSON provides structured data that's easier to work with in modern applications.
Python with pandas (simplest):
import pandas as pd
import json
# Extract table
tables = pd.read_html(html)
df = tables[0]
# Convert to JSON (various formats)
# Records format (list of objects)
json_data = df.to_json(orient='records')
# [{"col1": "val1", "col2": "val2"}, ...]
# Values format (2D array)
json_data = df.to_json(orient='values')
# [["val1", "val2"], ["val3", "val4"], ...]
# Table format (with schema)
json_data = df.to_json(orient='table')
# Save to file
df.to_json('output.json', orient='records', indent=2)
JavaScript with Cheerio:
const cheerio = require('cheerio');
const $ = cheerio.load(html);
// Extract headers
const headers = [];
$('table thead th').each((i, el) => {
headers.push($(el).text().trim());
});
// Extract rows as array of objects
const data = [];
$('table tbody tr').each((i, row) => {
const rowData = {};
$(row).find('td').each((j, cell) => {
rowData[headers[j]] = $(cell).text().trim();
});
data.push(rowData);
});
// Save as JSON
const fs = require('fs');
fs.writeFileSync('output.json', JSON.stringify(data, null, 2));
Different JSON formats:
Records (most common):
[
{"name": "John", "age": 30, "city": "NYC"},
{"name": "Jane", "age": 25, "city": "LA"}
]
Columns:
{
"name": ["John", "Jane"],
"age": [30, 25],
"city": ["NYC", "LA"]
}
Index (keyed by row index):
{
"0": {"name": "John", "age": 30},
"1": {"name": "Jane", "age": 25}
}
Advantages of JSON over CSV:
- Preserves data types (numbers, booleans, null)
- Supports nested structures
- Better for APIs and web applications
- No delimiter conflicts
- Can include metadata
Type preservation:
JSON maintains types while CSV converts everything to strings:
df = df.infer_objects() # Infer types before export
df.to_json('output.json', orient='records')
Handling special values:
# Handle NaN/None values
df.to_json('output.json', orient='records', force_ascii=False, date_format='iso')
When to use JSON:
- Building APIs
- Consuming data in JavaScript
- Need to preserve data types
- Complex or nested data structures
- Integrating with NoSQL databases
When to use CSV instead:
- Importing to Excel/Google Sheets
- Simple tabular data
- Need smallest file size
- End users prefer spreadsheet format