The xpath for text I wish to extract is reliably located deep in the tree at
...table/tbody/tr[4]/td[2]
Specifically, td[2] is structured like so
<td class="val">xyz</td>
I am trying to extract the text “xyz”, but a broad search returns multiple results. For example the following path returns 10 elements.
xpath('//td[@class="val"]')
… while a specific search doesn’t return any elements. I am unsure why the following returns nothing.
xpath('//tbody/tr/td[@class="val"]')
One solution involves..
table = root.xpath('//table[@class="123"]')
#going down the tree
xyz = table[0][3][1]
print vol.text
However, I am pretty sure this extremely brittle. I would appreciate it if someone could tell me how to construct an xpath search that would be both un-brittle and relatively cheap on resources
You haven’t mentioned it explicitly, but if your target
tableandtdtag classes are reliable then you could do something like:And you half dodge the issue of
tbodybeing there or not.However, there’s no substitute for actually seeing the material you are trying to parse for recommending XPATH queries…