I have an XML file of(30GB) which contains 2 classes of data, The data of class 1 has corresponding
<id="11" class="1" bestmatchingid="50" Body="abc"> </id>
.
.
.
<id="9999890" class="2" MatchingClass1Id="11" Body="xyz"></id>
Now the task is to extract class1’s body and corresponding class 2’s body where e.g.
class1's id(11)== MatchingClass1Id of class2(which is 9999890)
I am accomplishing the same by using string comparison’s in Python…is there a more efficient way in Python to accomplish the same considering my file size is 30 GB
lxml works good for your purpose. Also since you are a begineer..so for understanding the basic refer to the tutorial:
http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html
All iterparse method is an efficient method to solve your problem