I’m having problems with an archive that I built using zipfile in Python. I’m iterating over all the files in a directory and writing them to an archive. When I attempt to extract them afterward I get an exception related to the path separator.
the_path= "C:\\path\\to\\folder"
zipped= cStringIO.StringIO()
zf = zipfile.ZipFile(zipped_cache, "w", zipfile.ZIP_DEFLATED)
for dirname, subdirs, files in os.walk(the_path) :
for filename in files:
zf.write(os.path.join(dirname, filename), os.path.join(dirname[1+len(the_path):], filename))
zf.extractall("C:\\destination\\path")
zf.close()
zipped_cache.close()
Here’s the exception:
zipfile.BadZipfile: File name in
directory “env\index” and header
“env/index” differ.
Update: I replaced the string buffer cStringIO.StringIO() with a temporary file (tempfile.mkstemp("temp.zip")) and now it works. There’s something that happens when the zipfile module writes to the buffer that corrupts the archive, not sure what the problem is though.
The issue was that I was reading/writing the information from/into files that were open in “r”/”w” mode instead of “rb”/”wb”. This isn’t an issue in Linux, but it gave me errors in Windows due to character encoding. Solved.
Found the answer to my question here: http://www.penzilla.net/tutorials/python/scripting.
I’m pasting the two functions that are relevant to zipping up a directory. The problem was not the string buffer, nor the slashes, but the way I was iterating and writing to the zipfile. These 2 recursive functions fix the problem. Iterating over the entire tree of sub-directories with
os.walkis not a good way to write the archive.