I many, many .xml files and i need to extract some co-ordinates from them.
Extracting data straight from .xml files seems to be very, very complicated – so i am working saving the .xml files as .txt files and extracting the data that way. However, when i open the .txt file, my data is all bunched together on about 6 lines.. And all the scripts i have found so far select the data by reading the first word on each line.. but obviously that won’t work for me!
I need to extract the numbers inbetween these comments:
<gml:lowerCorner>137796 483752</gml:lowerCorner> <gml:upperCorner>138178 484222</gml:upperCorner>
In the text file they are all grouped together! Does anyone know how to extract this data? Thank you!
This is absolutely the wrong approach. Leave it alone and improve your ways 🙂
Seriously, if the file is XML, then just use an XML parser to read it. Learning how to do it in Python isn’t hard and will make your life easier now and much easier in the future, when you may find yourself facing more complex parsing needs, and you won’t have to re-learn it.
Look at
xml.etree.ElementTree.ElementTree. Here’s some sample code:Now just read the documentation of the module and see what you can do with
tree. You’ll be surprised to find out how simple it is to get to information this way. If you have specific questions about extracting data, I suggest you open another question in which you specify the format of the XML file you have to parse, and what data you have to take out of there. I’m sure you will have working code suggested to you in matters of minutes.