I’m trying to use curl to get some data from the web. What I have is a url like somewebsite.com. On this website, there’s a whole bunch of <divs> that have a class="control-element" and have this markup:
<div class="control-element">
<a href="http://someurl.com/and/some/path">Anchor Text</a>
</div>
How should I extract the url and the anchor text for each of these links? Should I be using regex for this? or what’s the best way to do it?
I think in this particular case you could be just fine using file_get_contents() instead of cURL.
For html parsing take a look at Simple HTML DOM.
If you don’t want to use any 3-rd party libraries, here is an example using regex: