I am writing a piece of code that opens a (possibly gzipped) textfile that works in both Python 2 and Python 3.
If I would have only normal textfiles (not compressed) I could do:
import io
for line in io.open(file_name, encoding='some_encoding'):
pass
If I would not care about decoding (using strings / bytes in python 2/3)
if file_name.endswith('.gz'):
file_obj = gzip.open(file_name)
else:
file_obj = open(file_name)
for line in file_obj:
pass
How can I in a smooth way take care of both of these cases? In other words, how to smoothly integrate decode with gzip.open()?
I tested this briefly and it seems to do the right thing. You can provide a file obj to
gzip.GzipFileand toio.opensoThat gives me a
UnicodeDecodeErrorbecause the file I’m reading isn’t actually UTF-8 so it would appear to be doing the right thing.For some reason if I use
io.opento openfile.gzdirectlygzipsays that the file is not a compressed file.UPDATE
Yeah, that’s silly, the streams are the wrong way around to begin with.
test file
The following code decodes the compressed file with the defined codec
The
codecs.StreamReadertakes a stream so you should be able to pass the compressed or uncompressed files to it.http://docs.python.org/library/codecs.html#codecs