Background: I’m working through the “Programming Collective Intelligence” book by Toby Segaran; specifically the Kayak API example from Chapter 5.
I can navigate with my browser (Chrome) to the Kayak API results page (which is all XML) here:http://www.kayak.com/s/basic/flight?searchid=%5Bsearchidhere%5D&c=999&apimode=1&sid=[sessionidhere]&version=1
(I’ve previously created the session ID and the search ID successfully)
However, when I use
import urllib2
import xml.dom.minidom
url = 'http://www.kayak.com/s/basic/flight?searchid=NQnNrj&c=999&apimode=1&_sid_=19-y2WnyKIGm1FuaLfo2keV&version=1'
doc=xml.dom.minidom.parseString(urllib2.urlopen(url).read())
I get the following response
[...discarded top bit of Traceback...]
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
I’ve tested it with Python 2.5.4 and Python 2.7.1. I’m almost 100% sure I’ve previously experimented with this and it worked successfully, and I don’t know where I’m going wrong.
Could anybody please help? Thanks!
Probably your problem is related to cookies.
Coincidentally, I usually browse the web without Javascript or Cookies enabled for sites on which I don’t need them, and in that condition I clicked the link.
Without JS, Cookies and Referer information, I got a 404 page. After enabling all of those, I got a ‘Search Expired’ page. In order to confirm my theory, I enabled JS and Referer and clicked the link again, which led me to a 404 page again.
So, build an opener with HTTPCookieProcessor and the issue should be resolved.
Regards