I’m trying to process a large gzip file pulled from the internet in python

Question

0

Asked: May 31, 20262026-05-31T13:34:28+00:00 2026-05-31T13:34:28+00:00

I’m trying to process a large gzip file pulled from the internet in python

0

I’m trying to process a large gzip file pulled from the internet in python using urllib2 and zlib and techniques from these two stackoverflow questions:

This works great, except that after each chunk of the file is read, I need to do some operations on the resultant string which involve a lot of splitting and iterating. This takes some time and when the code goes to do the next req.read(), it returns nothing, and the program ends, having only read the first chunk.

If I comment out the other operations, the whole file is read and decompressed. Code:

d = zlib.decompressobj(16+zlib.MAX_WBITS)
CHUNK = 16 * 1024
url = 'http://foo.bar/foo.gz'
req = urllib2.urlopen(url)
while True:
    chunk = req.read(CHUNK)
    if not chunk:
        print "DONE"
        break
    s = d.decompress(chunk)
    # ...
    # lots of operations with s
    # which might take a while
    # but not more than 1-2 seconds

Any ideas?

Edit:
This turned out to be a bug elsewhere in the program, NOT in the urllib2/zlib handling. Thanks to everyone who helped. I can recommend the pattern used in the code above if you need to handle large gzip files.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T13:34:28+00:00

Editorial Team

2026-05-31T13:34:28+00:00Added an answer on May 31, 2026 at 1:34 pm

This turned out to be a bug elsewhere in the program, NOT in the urllib2/zlib handling. I can recommend the pattern used in the code above if you need to handle large gzip files.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to process a large gzip file pulled from the internet in python

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply