I am trying to pull at list of resource/database names and IDs from a

Question

0

Asked: May 11, 20262026-05-11T10:34:56+00:00 2026-05-11T10:34:56+00:00

I am trying to pull at list of resource/database names and IDs from a

0

I am trying to pull at list of resource/database names and IDs from a listing of resources that my school library has subscriptions to. There are pages listing the different resources, and I can use urllib2 to get the pages, but when I pass the page to BeautifulSoup, it truncates its tree just before the end of the entry for the first resource in the list. The problem seems to be in image link used to add the resource to a search set. This is where things get cut off, here’s the HTML:

<a href='http://www2.lib.myschool.edu:7017/V/ACDYFUAMVRFJRN4PV8CIL7RUPC9QXMQT8SFV2DVDSBA5GBJCTT-45899?func=find-db-add-res&amp;resource=XYZ00618&amp;z122_key=000000000&amp;function-in=www_v_find_db_0' onclick='javascript:addToz122('XYZ00618','000000000','myImageXYZ00618','http://discover.lib.myschool.edu:8331/V/ACDYFUAMVRFJRN4PV8CIL7RUPC9QXMQT8SFV2DVDSBA5GBJCTT-45900');return false;'>     <img name='myImageXYZ00618' id='myImageXYZ00618' src='http://www2.lib.myschool.edu:7017/INS01/icon_eng/v-add_favorite.png' title='Add to My Sets' alt='Add to My Sets' border='0'> </a>

And here is my python code:

import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen('http://discover.lib.myschool.edu:8331/V?func=find-db-1-title&mode=titles&scan_start=latp&scan_utf=D&azlist=Y&restricted=all') print BeautifulSoup(page).prettify

In BeautifulSoup’s version, the opening <a href...> shows up, but the <img> doesn’t, and the <a> is immediately closed, as are the rest of the open tags, all the way to </html>.

The only distinguishing trait I see for these ‘add to sets’ images is that they are the only ones to have name and id attributes. I can’t see why that would cause BeautifulSoup to stop parsing immediately, though.

Note: I am almost entirely new to Python, but seem to be understanding it all right.

Thank you for your help!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T10:34:56+00:00

I was using Firefox’s ‘view selection source’, which apparently cleans up the HTML for me. When I viewed the original source, this is what I saw

<img name='myImageXYZ00618' id='myImageXYZ00618' src='http://www2.lib.myschool.edu:7017/INS01/icon_eng/v-add_favorite.png' alt='Add to My Sets' title='Add to My Sets' border='0'title='Add to clipboard PAIS International (CSA)' alt='Add to clipboard PAIS International (CSA)'>

By putting a space after the border='0' attribute, I can get BS to parse the page.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to pull at list of resource/database names and IDs from a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply