I have an http response from urllib
response = urllib2.urlopen('http://python.org/')
Eventually, I want to be able to seek() within the response (at least to the beginning). So I want to be able to have code like this:
print result.readline()
result.seek(0)
print result.readline()
The simplest solution to this problem is StringIO or io.BytesIO like this:
result = io.BytesIO(response.read())
However, the thing is that the resources I want to request tend to be very large and I want to start working with them (parse…) before the whole download is finished. response.read() is blocking. I’m looking for a non-blocking solution.
The ideal code would read(BUFFER_SIZE) from the resource and whenever more content is needed, just request more from the response. I’m basically looking for a wrapper class that can do that. Oh, and I need a file like object.
I thought, I could write something like:
base = io.BufferedIOBase(response)
result = io.BufferedReader(base)
However, it turns out that this does not work and I have tried different classes from the io module but couldn’t get it working. I’m happy with any wrapper class that has the desired behaviour.
I wrote my own wrapper class which preserves the first chunk of data. This way I can seek back to the beginning, analyze the encoding, file type and other things. This class solves the problem for me and should be simple enough to adapt to other use cases.