I have an http response from urllib response = urllib2.urlopen(‘http://python.org/’) Eventually, I want to

Question

0

Asked: June 16, 20262026-06-16T21:03:07+00:00 2026-06-16T21:03:07+00:00

I have an http response from urllib response = urllib2.urlopen(‘http://python.org/’) Eventually, I want to

0

I have an http response from urllib

response = urllib2.urlopen('http://python.org/')

Eventually, I want to be able to seek() within the response (at least to the beginning). So I want to be able to have code like this:

print result.readline()
result.seek(0)
print result.readline()

The simplest solution to this problem is StringIO or io.BytesIO like this:

result = io.BytesIO(response.read())

However, the thing is that the resources I want to request tend to be very large and I want to start working with them (parse…) before the whole download is finished. response.read() is blocking. I’m looking for a non-blocking solution.

The ideal code would read(BUFFER_SIZE) from the resource and whenever more content is needed, just request more from the response. I’m basically looking for a wrapper class that can do that. Oh, and I need a file like object.

I thought, I could write something like:

base = io.BufferedIOBase(response)
result = io.BufferedReader(base)

However, it turns out that this does not work and I have tried different classes from the io module but couldn’t get it working. I’m happy with any wrapper class that has the desired behaviour.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T21:03:08+00:00

I wrote my own wrapper class which preserves the first chunk of data. This way I can seek back to the beginning, analyze the encoding, file type and other things. This class solves the problem for me and should be simple enough to adapt to other use cases.

class BufferedFile(object):
    ''' A buffered file that preserves the beginning of a stream up to buffer_size
    '''
    def __init__(self, fp, buffer_size=1024):
        self.data = cStringIO.StringIO()
        self.fp = fp
        self.offset = 0
        self.len = 0
        self.fp_offset = 0
        self.buffer_size = buffer_size

    @property
    def _buffer_full(self):
        return self.len >= self.buffer_size

    def readline(self):
        if self.len < self.offset < self.fp_offset:
            raise BufferError('Line is not available anymore')
        if self.offset >= self.len:
            line = self.fp.readline()
            self.fp_offset += len(line)

            self.offset += len(line)

            if not self._buffer_full:
                self.data.write(line)
                self.len += len(line)
        else:
            line = self.data.readline()
            self.offset += len(line)
        return line

    def seek(self, offset):
        if self.len < offset < self.fp_offset:
            raise BufferError('Cannot seek because data is not buffered here')
        self.offset = offset
        if offset < self.len:
            self.data.seek(offset)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an http response from urllib response = urllib2.urlopen(‘http://python.org/’) Eventually, I want to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply