I have a hard time visualizing and conceiving away to scrape this page: http://www.morewords.com/ends-with/aw for the words themselves. Given a URL, I’d like to get the contents and then generate a php array with all the words listed, which in the source look like
<a href="/word/word1/">word1</a><br />
<a href="/word/word2/">word2</a><br />
<a href="/word/word3/">word3</a><br />
<a href="/word/word4/">word4</a><br />
There are a few ways I have been thinking about doing this, i’d appreciate if you could help me decide the most efficient way. Also, i’d appreciate any advice or examples on how to achieve this. I understand it’s not incredibly complicated, but I could use the help of you advanced hackers.
- Use some sort of jquery
$.each()to loop through and somehow case them into a JS array, and then transcribe (probably heavily taxing) - use some sort of curl (don’t really have much experience with curl)
- use some sophisticated find and replace with regex.
You tagged it as PHP, so here is a PHP solution 🙂
CodePad.
If
allow_url_fopenis disabled inphp.ini, you could use cURL to get the HTML.