I can use urllib2 to make HEAD requests like so:
import urllib2
request = urllib2.Request('http://example.com')
request.get_method = lambda: 'HEAD'
urllib2.urlopen(request)
The problem is that it appears that when this follows redirects, it uses GET instead of HEAD.
The purpose of this HEAD request is to check the size and content type of the URL I’m about to download so that I can ensure that I don’t download some huge document. (The URL is supplied by a random internet user through IRC).
How could I make it use HEAD requests when following redirects?
Good question! If you’re set on using
urllib2, you’ll want to look at this answer about the construction of your own redirect handler.In short (read: blatantly stolen from the previous answer):
Also, as mentioned in the errata, you can use Python Requests.