What would be the best way to do this.
The input string is
<133_3><135_3><116_2>The other system worked for about 1 month</116_2> got some good images <137_3>on it then it started doing the same thing as the first one</137_3> so then I quit using either camera now they are just sitting and collecting dust.</135_3></133_3>
the expected output is
{'The other system worked for about 1 month got some good images on it then it started doing the same thing as the first one so then I quit \ using either camera now they are just sitting and collecting dust.':[133, 135], 'The other system worked for about 1 month': [116], 'on it then it started doing the same thing as the first one':[137] }
that seems like a recursive regexp search but I can’t figure out how exactly.
I can think of a tedious recursive function as of now, but have a feeling that there should be a better way.
Related question: Can regular expressions be used to match nested patterns?
Use expat or another XML parser; it’s more explicit than anything else, considering you’re dealing with XML data anyway.
However, note that XML element names can’t start with a number as your example has them.
Here’s a parser that will do what you need, although you’ll need to tweak it to combine duplicate elements into one dict key: