I have a file that contains the following type and structure of data:
<data>
<from>A</from>
<to>B</to>
<data>
<name>EXAMPLE ONE</name>
<info>
<some_data>1</some_data>
<more_data>2</more_data>
</info>
<random>
<some_tag>
</foobar>
<foo>
<bar />
</foo>
</random>
</data>
<data>
<name>EXAMPLE TWO</name>
<info>
<some_data>3</some_data>
<more_data>4</more_data>
</info>
<random>
<some_tag>
</foobar>
<foo>
<bar />
</foo>
</random>
</data>
</data>
<data>
<from>C</from>
<to>D</to>
<data>
<name>EXAMPLE</name>
<info>
<some_data>1</some_data>
<more_data>2</more_data>
</info>
<random>
<some_tag>
</foobar>
<foo>
<bar />
</foo>
</random>
</data>
</data>
The data continues in this exact structure in the file with the exception of the inner most <data>...</data> tags that can and is repeated n times, the data structure always starts with a <data> tag and then continues with the <from>...</from> and <to>...</to> tags.
What i want to do is to extract all the data between the outer most <data> tags with the <to> and <from> as a description of the data blocks. I of course also want to seperate the inner most <data> tags from each other and save this data in a way so that it’s clear that the outer most data is related to the parent data.
I don’t have a exact idea of how i want to save the data so any examples is appreciated!
I’m testing this with the Python module BeautifulSoup and have searched and read a lot of examples here but haven’t found anything that can point me into the correct direction.
Thanks!
The fact that you are doubling the tag name
<data>as the container of your records as well as an element inside creates problems.BeautifulSoupis forgiving of such issues and here is a way you may want to use in case you cannot go back and change the XML structure.Assign the data to a variable. This may be read in from text file, of course:
Process the data:
And the result with this dummy data: