I have a file that contains the following type and structure of data: <data>

Question

0

Asked: June 8, 20262026-06-08T07:49:31+00:00 2026-06-08T07:49:31+00:00

I have a file that contains the following type and structure of data: <data>

0

I have a file that contains the following type and structure of data:

<data>
    <from>A</from>
    <to>B</to>
    <data>
        <name>EXAMPLE ONE</name>
        <info>
            <some_data>1</some_data>
            <more_data>2</more_data>
        </info>
        <random>
            <some_tag>
            </foobar>
            <foo>
                <bar />
           </foo>
        </random>
    </data>
    <data>
        <name>EXAMPLE TWO</name>
        <info>
            <some_data>3</some_data>
            <more_data>4</more_data>
        </info>
        <random>
            <some_tag>
            </foobar>
            <foo>
                <bar />
           </foo>
        </random>
   </data>
</data>
<data>
    <from>C</from>
    <to>D</to>
    <data>
        <name>EXAMPLE</name>
        <info>
            <some_data>1</some_data>
            <more_data>2</more_data>
        </info>
        <random>
            <some_tag>
            </foobar>
            <foo>
                <bar />
           </foo>
        </random>
    </data>
 </data>

The data continues in this exact structure in the file with the exception of the inner most <data>...</data> tags that can and is repeated n times, the data structure always starts with a <data> tag and then continues with the <from>...</from> and <to>...</to> tags.

What i want to do is to extract all the data between the outer most <data> tags with the <to> and <from> as a description of the data blocks. I of course also want to seperate the inner most <data> tags from each other and save this data in a way so that it’s clear that the outer most data is related to the parent data.

I don’t have a exact idea of how i want to save the data so any examples is appreciated!

I’m testing this with the Python module BeautifulSoup and have searched and read a lot of examples here but haven’t found anything that can point me into the correct direction.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T07:49:34+00:00

The fact that you are doubling the tag name <data> as the container of your records as well as an element inside creates problems. BeautifulSoup is forgiving of such issues and here is a way you may want to use in case you cannot go back and change the XML structure.

Assign the data to a variable. This may be read in from text file, of course:

data = '''<data>
    <from>A</from>
    <to>B</to>
    <data>
        <name>EXAMPLE ONE</name>
        <info>
            <some_data>1</some_data>
            <more_data>2</more_data>
        </info>
        <random>
            <some_tag>
            </foobar>
            <foo>
                <bar />
           </foo>
        </random>
    </data>
    <data>
        <name>EXAMPLE TWO</name>
        <info>
            <some_data>3</some_data>
            <more_data>4</more_data>
        </info>
        <random>
            <some_tag>
            </foobar>
            <foo>
                <bar />
           </foo>
        </random>
   </data>
</data>
<data>
    <from>C</from>
    <to>D</to>
    <data>
        <name>EXAMPLE</name>
        <info>
            <some_data>1</some_data>
            <more_data>2</more_data>
        </info>
        <random>
            <some_tag>
            </foobar>
            <foo>
                <bar />
           </foo>
        </random>
    </data>
 </data>'''

Process the data:

from BeautifulSoup import BeautifulSoup
from pprint import pprint

store = {}
key = ()

soup = BeautifulSoup(data)

recs = soup.findAll('data')

for rec in recs:
    if rec.find('from'):
        key = (rec.find('from').text, 
               rec.find('to').text)
    else:
        item = {}
        item['name'] = rec.find('name').text
        item['some_data'] = rec.find('info').find('some_data').text
        item['more_data'] = rec.find('info').find('more_data').text
        if store.has_key(key):
            store[key].append(item)
        else:
            store[key] = [ item ]

pprint(store)

And the result with this dummy data:

{(u'A', u'B'): [{'more_data': u'2',
                 'name': u'EXAMPLE ONE',
                 'some_data': u'1'},
                {'more_data': u'4',
                 'name': u'EXAMPLE TWO',
                 'some_data': u'3'}],
 (u'C', u'D'): [{'more_data': u'2', 'name': u'EXAMPLE', 'some_data': u'1'}]}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a file that contains the following type and structure of data: <data>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply