I have to implement a function to get headers only (without doing a GET or POST) using urllib2. Here is my function:
def getheadersonly(url, redirections = True):
if not redirections:
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
info = {}
info['headers'] = dict(urllib2.urlopen(HeadRequest(url)).info())
info['finalurl'] = urllib2.urlopen(HeadRequest(url)).geturl()
return info
Uses code from answer this and this. However this is doing redirection even when the flag is False. I tried the code with:
print getheadersonly("http://ms.com", redirections = False)['finalurl']
print getheadersonly("http://ms.com")['finalurl']
Its giving morganstanley.com in both cases. What is wrong here?
Firstly, your code contains several bugs:
On each request of
getheadersonlyyou install a new global urlopener which is then used in subsequent calls ofurllib2.urlopenYou make two HTTP-requests to get two different attributes of a response.
The implementation of
urllib2.HTTPRedirectHandler.http_error_302is not so trivial and I do not understand how can it prevent redirections in the first place.Basically, you should understand that each handler is installed in an opener to handle certain kind of response.
urllib2.HTTPRedirectHandleris there to convert certain http-codes into a redirections. If you do not want redirections, do not add a redirection handler into the opener. If you do not want to open ftp links, do not addFTPHandler, etc.That is all you need is to create a new opener and add the
urllib2.HTTPHandler()in it, customize the request to be ‘HEAD’ request and pass an instance of the request to the opener, read the attributes, and close the response.