Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 157877
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T10:34:56+00:00 2026-05-11T10:34:56+00:00

I am trying to pull at list of resource/database names and IDs from a

  • 0

I am trying to pull at list of resource/database names and IDs from a listing of resources that my school library has subscriptions to. There are pages listing the different resources, and I can use urllib2 to get the pages, but when I pass the page to BeautifulSoup, it truncates its tree just before the end of the entry for the first resource in the list. The problem seems to be in image link used to add the resource to a search set. This is where things get cut off, here’s the HTML:

<a href='http://www2.lib.myschool.edu:7017/V/ACDYFUAMVRFJRN4PV8CIL7RUPC9QXMQT8SFV2DVDSBA5GBJCTT-45899?func=find-db-add-res&amp;resource=XYZ00618&amp;z122_key=000000000&amp;function-in=www_v_find_db_0' onclick='javascript:addToz122('XYZ00618','000000000','myImageXYZ00618','http://discover.lib.myschool.edu:8331/V/ACDYFUAMVRFJRN4PV8CIL7RUPC9QXMQT8SFV2DVDSBA5GBJCTT-45900');return false;'>     <img name='myImageXYZ00618' id='myImageXYZ00618' src='http://www2.lib.myschool.edu:7017/INS01/icon_eng/v-add_favorite.png' title='Add to My Sets' alt='Add to My Sets' border='0'> </a> 

And here is my python code:

import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen('http://discover.lib.myschool.edu:8331/V?func=find-db-1-title&mode=titles&scan_start=latp&scan_utf=D&azlist=Y&restricted=all') print BeautifulSoup(page).prettify 

In BeautifulSoup’s version, the opening <a href...> shows up, but the <img> doesn’t, and the <a> is immediately closed, as are the rest of the open tags, all the way to </html>.

The only distinguishing trait I see for these ‘add to sets’ images is that they are the only ones to have name and id attributes. I can’t see why that would cause BeautifulSoup to stop parsing immediately, though.

Note: I am almost entirely new to Python, but seem to be understanding it all right.

Thank you for your help!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T10:34:56+00:00Added an answer on May 11, 2026 at 10:34 am

    I was using Firefox’s ‘view selection source’, which apparently cleans up the HTML for me. When I viewed the original source, this is what I saw

    <img name='myImageXYZ00618' id='myImageXYZ00618' src='http://www2.lib.myschool.edu:7017/INS01/icon_eng/v-add_favorite.png' alt='Add to My Sets' title='Add to My Sets' border='0'title='Add to clipboard PAIS International (CSA)' alt='Add to clipboard PAIS International (CSA)'> 

    By putting a space after the border='0' attribute, I can get BS to parse the page.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to pull out a list of table names from a MySQL database.
I am working with 3 tables, trying to pull a list that match certain
I'm trying to pull events from my google calendar into a list view (most
Trying to pull rows from a table and list them in an ol, however,
I'm trying to pull data from the last row inserted (_id) of a database.
I'm trying to pull in images from Flickr using the phpFlickr library but the
I am trying to populate a dropdown list from database. In my view file
I'm trying to pull back a list of items that have a specific type
I'm trying to write code to pull a list of product items from a
I'm trying to pull values from a database for a web app where a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.