I’m trying to parse with lxml in python and this is my output
<td>
<span style="display:inline">text1</span>
<span style="display:none">text2</span>
<span>text3</span>
text4
</td>
Thought I was smart enough to use the following
tree = tr.xpath("//*[contains(@style,'inline')]/text()")
But then I thought I would only see text1.
What I want is to see text3 and text4 too so that the output will be
[‘text1’, ‘text3’, ‘text4’]
Can anyone send me to the right direction of doing it?
Explicitly exclude anything with
display:none:That said — this is only a distant approximation of what a browser would actually do; you’d want to be driving an actual browser (as with Selenium, embedding APIs, or the like) if you required strictly accurate results.