How do I convert HTML tables to JSON format?

Converting HTML tables to JSON provides structured data that's easier to work with in modern applications.

Python with pandas (simplest):

import pandas as pd
import json

# Extract table
tables = pd.read_html(html)
df = tables[0]

# Convert to JSON (various formats)

# Records format (list of objects)
json_data = df.to_json(orient='records')
# [{"col1": "val1", "col2": "val2"}, ...]

# Values format (2D array)
json_data = df.to_json(orient='values')
# [["val1", "val2"], ["val3", "val4"], ...]

# Table format (with schema)
json_data = df.to_json(orient='table')

# Save to file
df.to_json('output.json', orient='records', indent=2)

JavaScript with Cheerio:

const cheerio = require('cheerio');
const $ = cheerio.load(html);

// Extract headers
const headers = [];
$('table thead th').each((i, el) => {
  headers.push($(el).text().trim());
});

// Extract rows as array of objects
const data = [];
$('table tbody tr').each((i, row) => {
  const rowData = {};
  $(row).find('td').each((j, cell) => {
    rowData[headers[j]] = $(cell).text().trim();
  });
  data.push(rowData);
});

// Save as JSON
const fs = require('fs');
fs.writeFileSync('output.json', JSON.stringify(data, null, 2));

Different JSON formats:

Records (most common):

[
  {"name": "John", "age": 30, "city": "NYC"},
  {"name": "Jane", "age": 25, "city": "LA"}
]

Columns:

{
  "name": ["John", "Jane"],
  "age": [30, 25],
  "city": ["NYC", "LA"]
}

Index (keyed by row index):

{
  "0": {"name": "John", "age": 30},
  "1": {"name": "Jane", "age": 25}
}

Advantages of JSON over CSV:

  • Preserves data types (numbers, booleans, null)
  • Supports nested structures
  • Better for APIs and web applications
  • No delimiter conflicts
  • Can include metadata

Type preservation:

JSON maintains types while CSV converts everything to strings:

df = df.infer_objects()  # Infer types before export
df.to_json('output.json', orient='records')

Handling special values:

# Handle NaN/None values
df.to_json('output.json', orient='records', force_ascii=False, date_format='iso')

When to use JSON:

  • Building APIs
  • Consuming data in JavaScript
  • Need to preserve data types
  • Complex or nested data structures
  • Integrating with NoSQL databases

When to use CSV instead:

  • Importing to Excel/Google Sheets
  • Simple tabular data
  • Need smallest file size
  • End users prefer spreadsheet format

Related Questions