I am writing a Python web app and in it I plan to leverage Wikipedia. When trying out some URL Fetching code I was able to fetch both Google and Facebook (via Google App Engine services), but when I attempted to fetch wikipedia.org, I received an exception. Can anyone confirm that Wikipedia does not accept these types of page requests? How can Wikipedia distinguish between me and a user?
Code snippet (it’s Python!):
import os
import urllib2
from google.appengine.ext.webapp import template
class MainHandler(webapp.RequestHandler):
def get(self):
url = "http://wikipedia.org"
try:
result = urllib2.urlopen(url)
except urllib2.URLError, e:
result = 'ahh the sky is falling'
template_values= {
'test':result,
}
path = os.path.join(os.path.dirname(__file__), 'index.html')
self.response.out.write(template.render(path, template_values))
urllib2default user-agent is banned from wikipedia and it results in a 403 HTTP response.You should modify your application user-agent with something like this:
Bonus link:
High level Wikipedia Python Clients
http://www.mediawiki.org/wiki/API:Client_code#Python