Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8559295
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T16:00:32+00:00 2026-06-11T16:00:32+00:00

I am trying to get a page from wikipedia. I have already added a

  • 0

I am trying to get a page from wikipedia. I have already added a ‘User-Agent’ header to my request. However, when I open the page using urllib2.urlopen I get the following page as a result:

ERROR: The requested URL could not be retrieved

ERROR

The requested URL could not be retrieved

While trying to retrieve the URL the following error was encountered:


  • Access Denied.

    Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.

Here is the code I use to open the page:

def get_site(request_user_link,request):                                                    # request_user_link is request for url entered by user
                                                                                            # request is request generated by current page - used to get HTTP_USER_AGENT
                                                                                            # tag for WIKIPEDIA and other sites
    request_user_link.add_header('User-Agent',str(request.META['HTTP_USER_AGENT']))
    try:
        response = urllib2.urlopen(request_user_link)
    except urllib2.HTTPError, err:
        logger.error('HTTPError = ' +str(err.code))
        response=None
    except urllib2.URLError, err:
        logger.error('HTTPError = ' +str(err.reason))
        response=None
    except httplib.HTTPException, err:
        logger.error('HTTPException')
        response=None
    except Exception:
        import traceback
        logger.error('generic exception' + traceback.format_exec())
        response=None
    return response

I pass the value of the HTTP_USER_AGENT from the current user object as the “User-Agent” header for the request I send to wikipedia.
If there are any other headers I need to add to this request, please let me know. Otherwise, please advise an alternate solution.

EDIT: Please note that I was able to get the page successfully yesterday after I added the ‘User-Agent’ header. Today, I seem to be getting this Error page.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T16:00:33+00:00Added an answer on June 11, 2026 at 4:00 pm

    Wikipedia is not very forgiving if violate their crawling rules. As you first exposed your IP with the standard urllib2 user-agent you were branded in the logs. When the logs were ‘processed’ your IP was banned. This should be easily tested by running your script for another IP. Be careful since Wikipedia is also known to block IP ranges.

    IP bans are usually temporary, but if you have multiple offenses it can become permanent.

    Wikipedia also have autoban on known proxy servers. I suspect that they are them selves parsing anon proxy sites like proxy-list.org and commercial proxy sites like hidemyass.com for the IP’s.

    Wikipedia does this of course to protect the content from vandalism and spam. Please respect the rules.

    If possible I suggest the use of a local copy of wikipedia on your own servers. This copy you can violate to your harts content.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to get a random page from Wikipedia, using WikiMedia's documented Random method
I have a strange bug when trying to urlopen a certain page from Wikipedia.
I'm trying to get a list of link titles from a wikipedia page. I
I have a paginate I am trying to get the index page from an
I have a paginate I am trying to get the index page from an
I have been trying to get facebook page feed from json data with jquery.
I'm trying to post a photo to Facebook page from user using Facebook SDK
I am trying to get next page of results from textSearch call using next_page_token
everyone. I'm trying to get data from some page, it's updated using javascript. First,
iam trying to get all object's xpath's from loaded page via selenium something similar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.