I am trying to parse webpages using urllib2, BeautifulSoup and Python 2.7. The problem

Question

0

Asked: June 7, 20262026-06-07T02:48:57+00:00 2026-06-07T02:48:57+00:00

I am trying to parse webpages using urllib2, BeautifulSoup and Python 2.7. The problem

0

I am trying to parse webpages using urllib2, BeautifulSoup and Python 2.7.

The problem lies upstream: each time I try to retrieve a new webpage, I get the one I already retrieved. However, pages are different in my webbrowser: see page 1 and page 2. Is there something wrong with the loop over page numbers?

Here is a code sample:

def main(page_number_max):
    import urllib2 as ul
    from BeautifulSoup import BeautifulSoup as bs

    base_url = 'http://www.senscritique.com/clement/collection/#page='

    for page_number in range(1, 1+page_number_max):
        url = base_url + str(page_number) + '/'
        html = ul.urlopen(url)
        bt = bs(html)

        for item in bt.findAll('div', 'c_listing-products-content xl'):
            item_name = item.findAll('h2', 'c_heading c_heading-5 c_bold')
            print str(item_name[0].contents[1]).split('\t')[11]

        print('End of page ' + str(page_number) + '\n')

if __name__ == '__main__':
    page_number_max = 2
    main(page_number_max)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T02:48:59+00:00

When you send http request to server, everything after “#” character is ignored. The part after “#” is only available to browser.

If you open developer tools in Chrome browser (or open firebug in Firefox) you will see that everytime you change page on senscritique.com there is request sent to the server. That’s where the data you are looking for comes from.

I’m not going into details about what exacly to send in order to retrieve data from this page, because I think it’s not consistent with their TOS.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse webpages using urllib2, BeautifulSoup and Python 2.7. The problem

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply