I am trying to parse the XML returned by search engine APIs (Bing, Yahoo

Question

0

Asked: May 23, 20262026-05-23T17:59:06+00:00 2026-05-23T17:59:06+00:00

I am trying to parse the XML returned by search engine APIs (Bing, Yahoo

0

I am trying to parse the XML returned by search engine APIs (Bing, Yahoo & Blekko). The returned XML (for sample search query ‘sushi’) from Blekko takes the form:

<rss version="2.0">
<channel>
    <title>blekko | rss for &quot;sushi/rss /ps=100&quot;</title>
    <link>http://blekko.com/?q=sushi%2Frss+%2Fps%3D100</link>
    <description>Blekko search for &quot;sushi/rss /ps=100&quot;</description>
    <language>en-us</language>
    <copyright>Copyright 2011 Blekko, Inc.</copyright>
    <docs>http://cyber.law.harvard.edu/rss/rss.html</docs>
    <webMaster>webmaster@blekko.com</webMaster>
    <rescount>3M</rescount>
    <item>
        <title>Sushi - Wikipedia</title>
        <link>http://en.wikipedia.org/wiki/Sushi</link>
        <guid>http://en.wikipedia.org/wiki/Sushi</guid>
        <description>Article about sushi, a food made of vinegared rice combined with various toppings or fillings.  Sushi ( &#x3059;&#x3057;&#x3001;&#x5bff;&#x53f8;, &#x9ba8;, &#x9b93;, &#x5bff;&#x6597;, &#x5bff;&#x3057;, &#x58fd;&#x53f8;.</description>
        </item>
</channel>
</rss>

The section of python code to extract the required search result data is:

for counter in range(100):
    try:
        for item in BlekkoSearchResultsXML.getElementsByTagName('item'):
            Blekko_PageTitle = item.getElementsByTagName('title')[counter].toxml(encoding="utf-8")
            Blekko_PageDesc = item.getElementsByTagName('description')[counter].toxml(encoding="utf-8")
            Blekko_DisplayURL = item.getElementsByTagName('guid')[counter].toxml(encoding="utf-8")
            Blekko_URL = item.getElementsByTagName('link')[counter].toxml(encoding="utf-8")
            print "<h2>" + Blekko_PageTitle + "</h2><br />"
            print Blekko_PageDesc + "<br />"
            print Blekko_DisplayURL + "<br />"
            print Blekko_URL + "<br />"
    except IndexError:
        break

The code will not extract the Page Title of each search result returned, but does extract the rest of the info.

Furthermore, if I do not have the code:

print "<title>Page title to appear on browser tab</title>"

somewhere in the script, the title from the first search result is taken as the page title (i.e. the page appears with the title ‘Sushi – Wikipedia’ in the browser). If I do have a page title, the code still does not extract the page title from the search result.

The same code (with different tag names etc.) has the same problem with the Yahoo search API, but works fine with the Bing search API.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T17:59:07+00:00

I guess that the .toxml() method returns the XML for the element, including its delimiting tags, and then you’re getting something like this:

<h2><title>...</title></h2><br />
<description>...</description><br />
<guid>...</guid><br />

The title element is therefore interpreted as the page’s title, unless you specify your own in advance. Other elements are unknown to the browser, and it just displays their content as is.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse the XML returned by search engine APIs (Bing, Yahoo

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply