What’s the recommended way to handle Content-Encoding: gzip files when using urlgrabber ? Right

Question

0

Editorial Team

Asked: June 18, 20262026-06-18T20:20:57+00:00 2026-06-18T20:20:57+00:00

What’s the recommended way to handle Content-Encoding: gzip files when using urlgrabber ? Right

0

What’s the recommended way to handle Content-Encoding: gzip files when using urlgrabber?

Right now I’m monkey-patching it like this:

g = URLGrabber(http_headers=(("Accept-Encoding", "gzip"),))
g.is_compressed = False # I don't know yet if the server will send me compressed data

# Backup current method of handling downloaded headers
try:
    PyCurlFileObject.orig_hdr_retrieve
except AttributeError:
    PyCurlFileObject.orig_hdr_retrieve = PyCurlFileObject._hdr_retrieve

def hdr_retrieve(instance, buf):
    r = PyCurlFileObject.orig_hdr_retrieve(instance, buf)
    if "content-encoding" in buf.lower() and "zip" in buf.lower():
        g.is_compressed = True
    return r
PyCurlFileObject._hdr_retrieve = hdr_retrieve

g.urlgrab(url, dest)

if g.is_compressed:
    # ungzip file here

But it doesn’t look very clean and I fear it’s not threadsafe either…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T20:20:58+00:00

I think I’ve found a threadsafe solution:

g = URLGrabber((http_headers=(("Accept-Encoding", "gzip"),)))
g.opts._set_attributes(grabber=g)
try:
    PyCurlFileObject.orig_setopts
except AttributeError:
    PyCurlFileObject.orig_setopts = PyCurlFileObject._set_opts

    def setopts(instance, opts={}):
        PyCurlFileObject.orig_setopts(instance, opts)
        grabber = instance.opts.grabber
        grabber.is_compressed = False

        def hdr_retrieve(buf):
            r = PyCurlFileObject._hdr_retrieve(instance, buf)
            if "content-encoding" in buf.lower() and "zip" in buf.lower():
                grabber.is_compressed = True
            return r

        instance.curl_obj.setopt(pycurl.HEADERFUNCTION, hdr_retrieve)
    PyCurlFileObject._set_opts = setopts

but it still doesn’t feel quite “clean” 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What’s the recommended way to handle Content-Encoding: gzip files when using urlgrabber ? Right

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply