I’m using Python’s ElementTree to parse xml files. I have a “findall” to find all “revision” subelements, but when I iterate through the result, they are not in document order. What can I be doing wrong?
Here’s my code:
allrevisions = page.findall('{http://www.mediawiki.org/xml/export-0.5/}revision')
for rev in allrevisions:
print rev
print rev.find('{http://www.mediawiki.org/xml/export-0.5/}timestamp').text
Here’s a link to the document I’m parsing: http://pastie.org/2780983
Thanks,
bsg
-Oops. By going through my code and running it piece by piece, I worked out the problem – I had stuck in a reverse() on the elements list in the wrong place, which was causing all the trouble. Thank you so much for your help – I’m sorry it was such a silly issue.
The documentation for ElementTree says that
findallreturns the elements in document order.A quick test shows the correct behaviour:
Result:
It would be helpful to see the document you are parsing.
Update:
Using the source data you provided:
Result:
‘The Mind {{db-spam} ‘The Mind '''The Min <!-- PleasThe same order as they appear in the document.