I’m trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string ‘Text 1’, then grab the contents of the relevant content node.
<doc> <block> <title>Text 1</title> <content>Stuff I want</content> </block> <block> <title>Text 2</title> <content>Stuff I don't want</content> </block> </doc>
My Python code throws a wobbly:
>>> from lxml import etree >>> >>> tree = etree.XML('<doc><block><title>Text 1</title><content>Stuff I want</content></block><block><title>Text 2</title><content>Stuff I d on't want</content></block></doc>') >>> >>> # get all titles ... tree.xpath('//title/text()') ['Text 1', 'Text 2'] >>> >>> # match 'Text 1' ... tree.xpath('//title/text()='Text 1'') True >>> >>> # Follow parent from selected nodes ... tree.xpath('//title/text()/../..//text()') ['Text 1', 'Stuff I want', 'Text 2', 'Stuff I don't want'] >>> >>> # Follow parent from selected node ... tree.xpath('//title/text()='Text 1'/../..//text()') Traceback (most recent call last): File '<stdin>', line 1, in <module> File 'lxml.etree.pyx', line 1330, in lxml.etree._Element.xpath (src/ lxml/lxml.etree.c:14542) File 'xpath.pxi', line 287, in lxml.etree.XPathElementEvaluator.__ca ll__ (src/lxml/lxml.etree.c:90093) File 'xpath.pxi', line 209, in lxml.etree._XPathEvaluatorBase._handl e_result (src/lxml/lxml.etree.c:89446) File 'xpath.pxi', line 194, in lxml.etree._XPathEvaluatorBase._raise _eval_error (src/lxml/lxml.etree.c:89281) lxml.etree.XPathEvalError: Invalid type
Is this possible in XPath? Do I need to express what I want to do in a different way?
Do you want that?