URL:
http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml
This outputs something like:
<api><parse><text xml:space="preserve">text...</text></parse></api>
How do I get just the content between <text xml:space="preserve"> and </text>?
I used curl to fetch all the content from this URL. So this gives me:
$html = curl_exec($curl_handle);
What’s the next step?
Use PHP DOM to parse it. Do it like this:
This outputs: