I’m having trouble using the urllib2.urlopen on a particular URL on GAE. When I run the same code on using Eclipse, I’m able to retrieve the website data, but when I try it with a GAE implementation, I get ‘Status 500 Internal Server Error’.
On the ordinary Python app, I have the following code that works fine.
query2 = {'ORIGIN': 'LOS','DESTINATION':'ABV', 'DAY':'23',
'MONTHYEAR': 'JAN2012', 'RDAY': '-1', 'RMONTHYER': '-1',
'ADULTS': '1', 'KIDS': '0', 'INFANTS': '0', 'CURRENCY': 'NGN',
'DIRECTION': 'SEARCH', 'AGENT': '111210135256.41.138.183.192.29025'}
encoded = urllib.urlencode(query2)
url3 = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/171015'
request = urllib2.urlopen(url3, encoded)
print 'RESPONSE:', request
print 'URL :', request.geturl()
headers = request.info()
print 'DATE :', headers['date']
print 'HEADERS :'
print '---------'
print headers
data = request.read()
print 'LENGTH :', len(data)
print 'DATA :'
print '---------'
print data
This works just fine, but with GAE, it doesn’t. This is the GAE code:
class MainPage(webapp.RequestHandler):
def get(self):
query = {'ORIGIN': 'LOS','DESTINATION':'ABV', 'DAY':'23',
'MONTHYEAR': 'JAN2012', 'RDAY': '-1', 'RMONTHYER': '-1',
'ADULTS': '1', 'KIDS': '0', 'INFANTS': '0', 'CURRENCY': 'NGN',
'DIRECTION': 'SEARCH', 'AGENT': '111210135256.41.138.183.192.29025'}
urlkey = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/181002i?AJ=2&LANG=EN'
urlsearch = 'http://www.flyaero.com/cgi-bin/airkiosk/I7/171015'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'User-Agent' : user_agent }
try:
request = urllib2.urlopen(urlkey)
data = request.read()
info = request.info()
except urllib2.URLError, e:
print 'error code: ', e
print 'INFO:'
print info
print ''
print 'Old key is: ' + query['AGENT']
print 'Agent key is ' + query['AGENT']
encoded = urllib.urlencode(query)
print 'encoded data', encoded
print ''
print 'web data'
print''
try:
request2 = urllib2.urlopen(urlsearch, encoded)
data2 = request2.read()
info2 = request2.info()
except urllib2.URLError, e:
print 'error code: ', e
print 'INFO:'
print info2
print ''
print 'DATA: '
print data
There are two calls to urllib2.urlopen. The first one works, but the second one returns error 500 and the try-except block doesn’t catch it.
this is the message printed out by the request.info() command
Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 1662
I’m not on the develooper server, i’m developing with eclipse, and running from localhost on my system. The is the error message that appears on the brower and on eclipse console as well, this is the message:
WARNING 2011-12-10 17:29:31,703 urlfetch_stub.py:405] Stripped prohibited headers from URLFetch request: ['Host']
WARNING 2011-12-10 17:29:33,075 urlfetch_stub.py:405] Stripped prohibited headers from URLFetch request: ['Content-Length', 'Host']
ERROR 2011-12-10 17:29:38,305 __init__.py:463] ApplicationError: 2 timed out
<pre>Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\webapp\__init__.py", line 700, in __call__
handler.get(*groups)
File "C:\Users\TIOLUWA\Documents\CODES\Elipse\FlightShop\flightshop.py", line 124, in get
request2 = urllib2.urlopen(urlsearch, encoded)
File "C:\python25\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data)
File "C:\python25\lib\urllib2.py", line 381, in open
response = self._open(req, data)
File "C:\python25\lib\urllib2.py", line 399, in _open
'_open', req)
File "C:\python25\lib\urllib2.py", line 360, in _call_chain
result = func(*args)
File "C:\python25\lib\urllib2.py", line 1107, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\python25\lib\urllib2.py", line 1080, in do_open
r = h.getresponse()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\dist\httplib.py", line 213, in getresponse
self._allow_truncated, self._follow_redirects)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py", line 260, in fetch
return rpc.get_result()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\apiproxy_stub_map.py", line 592, in get_result
return self.__get_result_hook(self)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\api\urlfetch.py", line 358, in _get_fetch_result
raise DownloadError(str(err))
DownloadError: ApplicationError: 2 timed out
As the exception indicates, it’s failing because the outgoing HTTP request timed out. Instead of using urllib2, use URLFetch directly, and pass the
deadlineargument to thefetchfunction with a longer deadline.