How do I create robust CSS selectors that won't break?

Creating robust CSS selectors is crucial for maintaining scrapers over time as websites change.

Prioritize stable attributes:

  • Look for data-testid, data-component, or aria-label attributes that are less likely to change
  • Use semantic class names that describe content (e.g., product-title) rather than styling (e.g., text-lg-bold)
  • Prefer IDs when available, as they should be unique and stable

Avoid brittle patterns:

  • Don't rely on nth-child selectors unless absolutely necessary
  • Avoid selectors based purely on visual styling classes (btn-primary, text-lg) as these often change with design updates
  • Keep selectors as short as possible while maintaining specificity
  • Use direct child (>) or descendant relationships sparingly

Advanced techniques:

  • Combine multiple attributes for resilience: element[data-type="product"][class*="card"]
  • Use attribute substring matching: [class*="product"] matches any class containing "product"
  • Test selectors against multiple pages from the same site to verify consistency
  • Build fallback selectors for critical data points in case the primary selector fails

Example progression from brittle to robust:

  • Brittle: div > div > div:nth-child(3) > span
  • Better: div.product-card span.price
  • Best: [data-testid="product-price"]

The key is balancing specificity with flexibility, targeting elements by their semantic meaning rather than their position in the DOM.

Related Questions