Suppose I have this sort of HTML from which I need to select “text2” using lxml / ElementTree:
<div>text1<span>childtext1</span>text2<span>childtext2</span>text3</div>
If I already have the div element as mydiv, then mydiv.text returns just “text1”.
Using itertext() seems problematic or cumbersome at best since it walks the entire tree under the div.
Is there any simple/elegant way to extract a non-first text chunk from an element?
Well, lxml.etree provides full XPath support, which allows you to address the text items: