See this html
<div>
<p>
<span class="abc">Monitor</span> <b>$300</b>
</p>
<a href="/add">Add to cart</a>
</div>
<div>
<p>
<span class="abc">Keyboard</span> $20
</p>
<a href="/add">Add to cart</a>
</div>
Using xpath I want to parse Monitor $300 and Keyboard $20. I use this xpath
//div[a[contains(., "Add to cart")]]/p/text()
But it selects <span class="abc">Monitor</span> <b>$300</b>. I don’t want the tags. How do I get only the text?
You want to select all descendant text, not just child text:
Note the double slash between
pandtext()there.This potentially will also include a lot of inter-tag whitespace though, you you’ll need to clean that up. Example using
lxml: