I am trying to parse a solr output of the form: <doc> <str name=source>source:A</str>

Question

0

Asked: June 17, 20262026-06-17T14:56:39+00:00 2026-06-17T14:56:39+00:00

I am trying to parse a solr output of the form: <doc> <str name=source>source:A</str>

0

I am trying to parse a solr output of the form:

<doc>
<str name="source">source:A</str>
<str name="url">URL:A</str>
<date name="p_date">2012-09-08T10:02:01Z</date>
</doc>
<doc>
<str name="source">source:B</str>
<str name="url">URL:B</str>
<date name="p_date">2012-08-08T11:02:01Z</date>
</doc>

I am keen on using beautiful soup (versions that have BeautifulStoneSoup; I think prior to BS4) for parsing the docs.
I have used beautiful soup for HTML parsing but some how I am not able to find a effecient way to extract the contents of the tag.

I have written:

for tags in soup('doc'):
    print tags.renderContents()

I do sense that I can work my way through it forcibly to get the outputs (like say ‘soup’ing it again), but would appreciate an effecient solution to extract data.
My output required is:

source:A
URL:A
2012-09-08T10:02:01Z
source:B
URL:B
2012-08-08T11:02:01Z

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T14:56:40+00:00

Use a XML parser for task instead; xml.etree.ElementTree is included with Python:

from xml.etree import ElementTree as ET

# `ET.fromstring()` expects a string containing XML to parse.
# tree = ET.fromstring(solrdata)  
# Use `ET.parse()` for a filename or open file object, such as returned by urllib2:
ET.parse(urllib2.urlopen(url))

for doc in tree.findall('.//doc'):
    for elem in doc:
        print elem.attrib['name'], elem.text

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse a solr output of the form: <doc> <str name=source>source:A</str>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply