What are common XPath mistakes to avoid?

Understanding common XPath mistakes helps you write more reliable selectors.

1. Using absolute paths:

Bad:

/html/body/div[1]/div[2]/p

Breaks when HTML structure changes.

Good:

//p[@class='description']

Flexible and structure-independent.

2. Forgetting text() for text extraction:

Wrong (returns element):

title = tree.xpath('//h1[@class="title"]')[0]
# Returns: <Element h1>

Right (returns text):

title = tree.xpath('//h1[@class="title"]/text()')[0]
# Returns: "Product Title"

3. Class attribute matching issues:

Wrong (exact match required):

//div[@class='product']

Fails for <div class="product featured">

Right (flexible):

//div[contains(@class, 'product')]

4. Position() vs index:

Wrong (position is 1-indexed):

//li[0]  # Returns nothing

Right:

//li[1]  # First element

5. Confusion between // and /:

# All divs anywhere
//div

# Direct child divs only
/div

# All p within specific div
//div[@id='content']//p

# Direct p children only
//div[@id='content']/p

6. Not handling empty results:

Unsafe:

price = tree.xpath('//span[@class="price"]/text()')[0]
# IndexError if not found

Safe:

prices = tree.xpath('//span[@class="price"]/text()')
price = prices[0] if prices else None

Better (Scrapy):

price = response.xpath('//span[@class="price"]/text()').get()
# Returns None if not found

7. Confusing text() and string():

# text() - direct text nodes only
//div/text()

# string() - all text including children
string(//div)

8. Wrong axis for siblings:

Wrong (no parent-to-sibling axis):

//h2/../following-sibling::p

Right:

//h2/following-sibling::p

9. Predicates outside brackets:

Wrong:

//div/p[@class='intro'][1]

Returns first p with class intro.

Different meaning:

(//div/p[@class='intro'])[1]

Returns first of all matching p elements globally.

10. Not normalizing whitespace:

# Fails with extra whitespace
//button[text()='Submit']

# Works with whitespace variations
//button[normalize-space(text())='Submit']

11. Attribute vs text confusion:

# Wrong - @src is already the value
//img/@src/text()

# Right
//img/@src

12. Case sensitivity:

XPath is case-sensitive:

# Different results
//DIV  # Uppercase (XML)
//div  # Lowercase (HTML)

For HTML, always use lowercase.

Debugging tips:

  1. Test XPath in browser console: $x('//div[@class="product"]')
  2. Use XPath tester tools before implementing
  3. Start simple, add complexity gradually
  4. Check if elements exist before extracting
  5. Print results to verify data type and content

Best practices:

  • Keep paths short and relative
  • Use contains() for flexible matching
  • Always handle missing elements
  • Use normalize-space() for text matching
  • Test against multiple page samples
  • Add fallback selectors for resilience

Related Questions