How do I create robust CSS selectors that won't break?
Creating robust CSS selectors is crucial for maintaining scrapers over time as websites change.
Prioritize stable attributes:
- Look for
data-testid,data-component, oraria-labelattributes that are less likely to change - Use semantic class names that describe content (e.g.,
product-title) rather than styling (e.g.,text-lg-bold) - Prefer IDs when available, as they should be unique and stable
Avoid brittle patterns:
- Don't rely on
nth-childselectors unless absolutely necessary - Avoid selectors based purely on visual styling classes (
btn-primary,text-lg) as these often change with design updates - Keep selectors as short as possible while maintaining specificity
- Use direct child (
>) or descendant relationships sparingly
Advanced techniques:
- Combine multiple attributes for resilience:
element[data-type="product"][class*="card"] - Use attribute substring matching:
[class*="product"]matches any class containing "product" - Test selectors against multiple pages from the same site to verify consistency
- Build fallback selectors for critical data points in case the primary selector fails
Example progression from brittle to robust:
- Brittle:
div > div > div:nth-child(3) > span - Better:
div.product-card span.price - Best:
[data-testid="product-price"]
The key is balancing specificity with flexibility, targeting elements by their semantic meaning rather than their position in the DOM.