I’m trying to scrape a website where the URL is redirected, however programmatically trying

Question

0

Asked: May 17, 20262026-05-17T07:00:46+00:00 2026-05-17T07:00:46+00:00

I’m trying to scrape a website where the URL is redirected, however programmatically trying

0

I’m trying to scrape a website where the URL is redirected, however programmatically trying this gives me an 403 Error code (Forbidden). I can place the URL in the browser and the browser will properly follow the url though…

to show a simple example i’m trying to go to :
http://en.wikipedia.org/w/index.php?title=Mike_tyson

I’ve tried urllib2 and mechanize however both do not work. I am fairly new to web programming and was wondering whether there are some other tricks I need to do in order to follow the redirect!

Thanks!

EDIT

Okay, so this is really messed. I was originally looking into alternative methods because I was trying to scrape an Mp3. I was managing to succesfully downloading the mp3 but it was all mangled.

Turns out it was somehow related to me downloading it on windows or my current Python version.
I tested the code on my Ubuntu distro and the mp3 file downloaded perfectly fine….

So I just used simple urllib2.openurl and it worked perfect!

I wonder why downloading on Windows mangled the mp3?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T07:00:47+00:00

Try changing the mechanize flag to not respect robots.txt. Also, consider changing the User-Agent HTTP header:

>>> import mechanize
>>> br = mechanize.Browser()
>>> br.set_handle_robots(False)
>>> br.addheaders = [('User-Agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')]

Web servers will now treat you like you were running MS Internet Explorer 6, rather than a bot. Even if they do restrict you with robots.txt, your bot will continue to work until it is blocked.

>>> br.open('http://en.wikipedia.org/w/index.php?title=Mike_tyson')
<response_seek_wrapper at 0x... whose wrapped object = <closeable_response at 0x... whose fp = <socket._fileobject object at 0x...>>> #doctest: +ELLIPSIS

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to scrape a website where the URL is redirected, however programmatically trying

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply