When attempting to check the ‘content-length’ header for some web pages using urllib2 in

Question

0

Asked: June 5, 20262026-06-05T21:47:28+00:00 2026-06-05T21:47:28+00:00

When attempting to check the ‘content-length’ header for some web pages using urllib2 in

0

When attempting to check the ‘content-length’ header for some web pages using urllib2 in python, the header is missing. For example, the response from google.com is missing this header. Any idea why?

Example:

r = urllib2.urlopen('http://www.google.com')
i = r.info()
print i.keys()

Gives:

['x-xss-protection', 'set-cookie', 'expires', 'server', 'connection', 'cache-control', 'date', 'p3p', 'content-type', 'x-frame-options']

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T21:47:29+00:00

You can see here that an http response can either contain Content-Length or Transfer-Encoding: chunked.

However, when Transfer-Encoding: chunked is used in the header, after the headers, you’ll get a hexadecimal string which if converted to decimal, will give you the length of the next chunk. And after the last chunk you’ll get a 0 for this value which means you’ve reached the end of the file.

You can use regular expressions to get this hexadecimal value (not a must though)

read = #string containing a line or a part of the http response
hexPat = re.compile(r'([0-9A-F]+)\r\n', re.I)
match = re.search(hexPat, read)
chunkLen = int(match.group(1), 16) #converts hexadecimal to decimal

or You can just read the first hexadecimal value, get the length of the first chunk and receive that chunk, then get the length of the next chunk and so on till you find a 0

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When attempting to check the ‘content-length’ header for some web pages using urllib2 in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply