I use the following python code to download web pages from servers with gzip

Question

0

Asked: May 16, 20262026-05-16T21:01:38+00:00 2026-05-16T21:01:38+00:00

I use the following python code to download web pages from servers with gzip

0

I use the following python code to download web pages from servers with gzip compression:

url = "http://www.v-gn.de/wbb/"
import urllib2
request = urllib2.Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
content = response.read()
response.close()

import gzip
from StringIO import StringIO
html = gzip.GzipFile(fileobj=StringIO(content)).read()

This works generally, but for the specified URL fails with a struct.error exception.
I get a similar result if I use wget with an “Accept-encoding” header. However, browsers seem to be able to decompress the response.

So my question is: is there a way I can get my python code to decompress the HTTP response without resorting to disabling compression by removing the “Accept-encoding” header?

For completeness, here’s the line I use for wget:

wget --user-agent="Mozilla" --header="Accept-Encoding: gzip,deflate" http://www.v-gn.de/wbb/

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T21:01:38+00:00

It appears you can call readline() on the gzip.GzipFile object, but
read() raises a struct.error because the file ends abruptly.

Since readline works (except at the very end), you could do something like this:

import urllib2
import StringIO
import gzip
import struct

url = "http://www.v-gn.de/wbb/"
request = urllib2.Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
content = response.read()
response.close()
fh=StringIO.StringIO(content)
html = gzip.GzipFile(fileobj=StringIO.StringIO(content))
try:
    for line in html:
        line=line.rstrip()
        print(line)
except struct.error:
    pass

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I use the following python code to download web pages from servers with gzip

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply