I need to parse some XML containing inline elements. The XML look, for example, like this:
<section>
Fubar, I'm so fubar, fubar and even more <fref bar="baz">fubare</fref>. And yet more fubar.
</section>
If I iterate now over this structure with for elem in list(parent): ... I only get access to fref. If I now process fref, the surrounding text is of course lost, since text isn’t a real element.
Does anybody know of a way to properly address this issue?
The following shows how to achieve this with
lxml.From
lxml.etreetutorial:There’s also an XPath way to do this, it’s described in the linked page.