Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6912159
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T09:03:25+00:00 2026-05-27T09:03:25+00:00

I’m trying to open multiple pages following a certain format using mechanize. I want

  • 0

I’m trying to open multiple pages following a certain format using mechanize. I want to start with a certain page, and have mechanize follow all the links that have a certain class or piece of text in a link. For example, the root url would be something like

http://hansard.millbanksystems.com/offices/prime-minister

and I want to follow every link on the page that has a format such as

<li class='office-holder'><a href="http://hansard.millbanksystems.com/people/mr-tony-blair">Mr Tony Blair</a> May  2, 1997 - June 27, 2007</li>

In other words, I want to follow every link that has the class ‘office-holder’ or that has /people/ in the URL. I’ve tried the following code, but it hasn’t worked.

import mechanize

br = mechanize.Browser()
response = br.open("http://hansard.millbanksystems.com/offices/prime-minister")
links = br.links(url_regex="/people/")

print links

I’m trying to print the links so I can make sure that I’m getting the right links/information before writing any more code. The error(?) I get from this is:

<generator object _filter_links at 0x10121e6e0>

Any pointers or tips are appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T09:03:26+00:00Added an answer on May 27, 2026 at 9:03 am

    That’s not an error – it means that Browser.links() returns an generator object rather than a list.

    An iterator is an object that acts “like a list”, meaning that you can do things like

    for link in links:
        print link
    

    and so on. But you can only access things in whatever order it defines; you can’t necessarily do link[5], and once you’ve gone through the iterator, it’s used up.

    A generator is, for most purposes, just an iterator that doesn’t necessarily know all its results in advance. This is very useful in generator expressions, and you can actually write very simple functions that return generators with the yield keyword:

    def odds():
        x = 1
        while True:
            yield x
            x += 2
    
     os = odds()
     os.next() # returns 1
     os.next() # returns 3
    

    This is a Good Thing because it means that you don’t have to store all of your data in memory at once (which for odds() would be impossible…), and if you only need the first few elements of the result you don’t have to bother computing the rest. The itertools module has a bunch of handy functions for dealing with iterators.


    Anyway, if you just want to print out the contents of links, you can turn it into a list with the list() function (which takes an iterable and returns a list of its elements):

     print list(links)
    

    or make a list of strings with a list comprehension:

     print [l.url for l in list(links)]
    

    or walk over its elements and print them out:

     for l in links:
          print l.url
    

    But note that after you do this, links will be “exhausted” – so if you want to actually do anything with it, you’ll need to get it again.

    Maybe the simplest option is to immediately turn it into a list and not worry about it being an iterator at all:

    links = list(br.links(url_regex="/people/"))
    

    Also, you’re obviously not yet getting links that have the class you want. There might be some mechanize trick to do an “or” here, but a nifty way to do it using sets and generator expressions would be something like this:

     links = set(l.url for l in br.links(url_regex='/people/'))
     links.update(l.url for l in br.get_links_with_class('office-holder'))
    

    Obviously replace get_links_with_class with the real way to get those links. Then you’ll end up with a set of all the link URLs that have /people/ in their URL and/or have the class office-holder, with no duplicates. (Note that you can’t put the Link objects in the set directly because they’re not hashable.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Basically, what I'm trying to create is a page of div tags, each has
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I want to count how many characters a certain string has in PHP, but
I have a French site that I want to parse, but am running into
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
I have thousands of HTML files to process using Groovy/Java and I need to
I am trying to loop through a bunch of documents I have to put
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I'm making a simple page using Google Maps API 3. My first. One marker
I am trying to understand how to use SyndicationItem to display feed which is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.