So I’m trying to parse from this website http://dl.acm.org/dl.cfm . This website doesn’t allow

Question

0

Asked: June 8, 20262026-06-08T11:55:31+00:00 2026-06-08T11:55:31+00:00

So I’m trying to parse from this website http://dl.acm.org/dl.cfm . This website doesn’t allow

0

So I’m trying to parse from this website http://dl.acm.org/dl.cfm . This website doesn’t allow web scrapers, so hence I get an HTTP error: 403 forbidden.

I’m using python, so I tried mechanize to fill the form (to automate the filling of the form or a button click), but then again I got the same error.

I can’t even open the html page using urllib2.urlopen() function, it gives the same error.

Can anyone help me with this problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T11:55:32+00:00

If the website doesn’t allow web scrapers/bots, you shouldn’t be using bots on the site to begin with.

But to answer your question, I suspect the website is blocking urllib’s default user-agent. You’re probably going to have to spoof the user-agent to a known browser by crafting your own request.

headers = {"User-Agent":"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"}
req = urllib2.Request("http://dl.acm.org/dl.cfm", headers=headers)
urllib2.urlopen(req)

EDIT: I tested this and it works. The site is actively blocking based on user-agents to stop badly made bots from ignoring robots.txt

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So I’m trying to parse from this website http://dl.acm.org/dl.cfm . This website doesn’t allow

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply