I am trying to get a page from wikipedia. I have already added a

Question

0

Asked: June 11, 20262026-06-11T16:00:32+00:00 2026-06-11T16:00:32+00:00

I am trying to get a page from wikipedia. I have already added a

0

I am trying to get a page from wikipedia. I have already added a ‘User-Agent’ header to my request. However, when I open the page using urllib2.urlopen I get the following page as a result:

ERROR: The requested URL could not be retrieved

ERROR

The requested URL could not be retrieved

While trying to retrieve the URL the following error was encountered:

Access Denied.

Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.

Here is the code I use to open the page:

def get_site(request_user_link,request):                                                    # request_user_link is request for url entered by user
                                                                                            # request is request generated by current page - used to get HTTP_USER_AGENT
                                                                                            # tag for WIKIPEDIA and other sites
    request_user_link.add_header('User-Agent',str(request.META['HTTP_USER_AGENT']))
    try:
        response = urllib2.urlopen(request_user_link)
    except urllib2.HTTPError, err:
        logger.error('HTTPError = ' +str(err.code))
        response=None
    except urllib2.URLError, err:
        logger.error('HTTPError = ' +str(err.reason))
        response=None
    except httplib.HTTPException, err:
        logger.error('HTTPException')
        response=None
    except Exception:
        import traceback
        logger.error('generic exception' + traceback.format_exec())
        response=None
    return response

I pass the value of the HTTP_USER_AGENT from the current user object as the “User-Agent” header for the request I send to wikipedia.
If there are any other headers I need to add to this request, please let me know. Otherwise, please advise an alternate solution.

EDIT: Please note that I was able to get the page successfully yesterday after I added the ‘User-Agent’ header. Today, I seem to be getting this Error page.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T16:00:33+00:00

Wikipedia is not very forgiving if violate their crawling rules. As you first exposed your IP with the standard urllib2 user-agent you were branded in the logs. When the logs were ‘processed’ your IP was banned. This should be easily tested by running your script for another IP. Be careful since Wikipedia is also known to block IP ranges.

IP bans are usually temporary, but if you have multiple offenses it can become permanent.

Wikipedia also have autoban on known proxy servers. I suspect that they are them selves parsing anon proxy sites like proxy-list.org and commercial proxy sites like hidemyass.com for the IP’s.

Wikipedia does this of course to protect the content from vandalism and spam. Please respect the rules.

If possible I suggest the use of a local copy of wikipedia on your own servers. This copy you can violate to your harts content.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to get a page from wikipedia. I have already added a

ERROR

The requested URL could not be retrieved

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply