I want to parse a web page in Groovy and extract all of the href links and the associated text with it.
If the page contained these links:
<a href='http://www.google.com'>Google</a><br /> <a href='http://www.apple.com'>Apple</a>
the output would be:
Google, http://www.google.com<br /> Apple, http://www.apple.com
I’m looking for a Groovy answer. AKA. The easy way!
Assuming well-formed XHTML, slurp the xml, collect up all the tags, find the ‘a’ tags, and print out the href and text.