I have large (~75MB) pickled objects that are made available on mapped network drives (eg: X:/folder1/large_pickled_item.pk)
The objects contain numpy arrays+python lists, and are pickled using cPickle, protocol 2
When I try to unpickle the data, I get the following error messages:
Using pickle:
KeyError: (random character)
Using cPickle:
IOError: [Errno 22] Invalid argument
I do not get errors if the pickled objects are smaller in size,
or if I copy the (larger) objects to a local drive and run the same script.
Any idea where the problem lies? Is it a python+pickle problem or a windows shares issue?
Notes:
- I am using Python 2.7.2 on Windows XP Professional (SP3)
- I do not have control over the object format, I do not create them, I can only read them
-
Example stack Trace:
File “test.py”, line 38, in getObject
obj = pickle.load(input)
File “C:\software\python\lib\pickle.py”, line 1378, in load
return Unpickler(file).load()
File “C:\software\python\lib\pickle.py”, line 858, in load
dispatchkey
KeyError: ‘~’
Solution
- Read the file in chunks of 67076095 bytes into a string buffer.
- Call pickle.loads with the string buffer instead of pickle.load with the file object
This is due to a Windows bug, whereby reading and writing network files in chunks larger than 64MB does not work.
I suggest trying the mirror image of the workaround presented in https://stackoverflow.com/a/4228291/367273
If that doesn’t help, perhaps you could create a wrapper for the file object that would automatically split every large
read()into multiple smaller reads, and present that wrapper to the pickle module?