I am trying to parse a solr output of the form:
<doc>
<str name="source">source:A</str>
<str name="url">URL:A</str>
<date name="p_date">2012-09-08T10:02:01Z</date>
</doc>
<doc>
<str name="source">source:B</str>
<str name="url">URL:B</str>
<date name="p_date">2012-08-08T11:02:01Z</date>
</doc>
I am keen on using beautiful soup (versions that have BeautifulStoneSoup; I think prior to BS4) for parsing the docs.
I have used beautiful soup for HTML parsing but some how I am not able to find a effecient way to extract the contents of the tag.
I have written:
for tags in soup('doc'):
print tags.renderContents()
I do sense that I can work my way through it forcibly to get the outputs (like say ‘soup’ing it again), but would appreciate an effecient solution to extract data.
My output required is:
source:A
URL:A
2012-09-08T10:02:01Z
source:B
URL:B
2012-08-08T11:02:01Z
Thanks
Use a XML parser for task instead;
xml.etree.ElementTreeis included with Python: