I’m new to Regular Expressions and things like that. I have only few knowledge and I think my current problem is about them.
I have a webpage, that contains text. I want to get links from the webpage that are only in SPANs that have class='img'.
I go through those steps.
- grab all the
SPANs tagged with the ‘img’ class (this is the hard step that I’m looking for) - move those
SPANs to a new variable - Parse the variable to get an array with the links (Each
SPANhas only 1 link, so this will be easy)
I’m using PHP, but any other language doesn’t matter, I’m looking how to deal with the first step. Any one have a suggestion? Thanks 😀
Use PHPs DOMDocument-class in combination with the DOMXPath-class to navigate to the elements you need, like this:
<?php $dom = new DOMDocument(); $dom->loadHTML(file_get_contents('http://foo.bar')); $xpath = new DOMXPath($dom);$elements = $xpath->query('/html/body//span[@class='img']//a'); foreach ($elements as $a) { echo $a->getAttribute('href'), '\n'; }
You can learn more about the XPath Language on the W3C page.