Given an HTML like following:
...more html above...
<div class="any_name">
<p>Element A goes here</p>
<p>Element B goes here</p>
</div>
...more html below...
I need to get the xpath route of any element that contains (for example) “A goes” text and get some like:
/html/body/div[4]/div[2]/div/article/div/p
Note that the structure may be different in each case and I need to search through the entire document looking for text every time…
Actually I get the web content succesfully but applying some like this //element[text()=”A goes”] with Web::Scraper seems doesn’t work.
How can I get this xpath routes using content? Any ideas? Thanks!
You can use XML::Twig to get that. I changed the xpath you provided a little and made it more modular.
You can use a regular expression in your xpath to find the elements that match your letter. The one with
text()=didn’t work in this case, becauseXML::Twigmatches the complete text if you use=instead of=~ //. Also, the correct syntax isstring(), nottext().The
get_xpathmethod returns a list of elements. I use thexpathmethod on each of them, which returns the full xpath to the element. In my case that is:There is no match for
Cbecause I did not put it in the HTML code.