Here is the HTML code:
<div id="someid">
<h2>Specific text 1</h2>
<a class="hyperlinks" href="link"> link1 inside specific text 1</a>
<a class="hyperlinks" href="link"> link2 inside specific text 1</a>
<a class="hyperlinks" href="link"> link3 inside specific text 1</a>
<h2>Specific text 2</h2>
<a class="hyperlinks" href="link"> link1 inside specific text 2</a>
<a class="hyperlinks" href="link"> link2 inside specific text 2</a>
<a class="hyperlinks" href="link"> link3 inside specific text 2</a>
<a class="hyperlinks" href="link"> link4 inside specific text 2</a>
<h2>Specific text 3</h2>
<a class="hyperlinks" href="link"> link1 inside specific text 3</a>
<a class="hyperlinks" href="link"> link2 inside specific text 3</a>
</div>
I have to distinctly find links under each “Specific text”. The problem is that if I write the following code in python:
links = root.xpath("//div[@id='someid']//a")
for link in links:
print link.attrib['href']
It prints ALL the links irrespective of “Specific Text x”, Whereas I want something like:
print "link under Specific text:"+specific+" link:"+link.attrib['href']
Please suggest
I think you will need one XPath expression for each h2 specific text.
Given an h2 specific text, you can get its following adjacent a siblings by:
The second
//h2selection handles the case where h2 is the last one.The expression above just exploits the XPath 1.0 intersection formula:
You can find a lot of resources about this method, lot of answers here at SO (check my answers also). I think it’s not difficult to understand how to apply this formula, what is difficult is to understand when it must be applied.
Credits for the formul goes to @Michael Key. Just google it a bit.
My expression has been extended with additional predicates to handle your specific case and unified (
|) with additional expression to handle last h2.