I’m trying to write myself a simple Python application to get a contents of a topic on Wikipedia. For instance as an example I’m trying to get the contents of the page on the fruit apple . This is my query:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=apple
This is what the output (formated) looks like:
But this doesn’t really look like XML. It looks more like (I think) php. Should I just try to parse this with Python or is there a better way?
It’s not PHP, it’s media wiki formatting.
Look at the formatting mw page: http://www.mediawiki.org/wiki/API:Parsing_wikitext
Personally, the json formatted version looks better to me (once it’s parsed).