I am trying to compress a huge python object ~15G, and save it on the disk. Due to requrement constraints I need to compress this file as much as possible. I am presently using zlib.compress(9). My main concern is the memory taken exceeds what I have available on the system 32g during compression, and going forward the size of the object is expected to increase. Is there a more efficient/better way to achieve this.
Thanks.
Update: Also to note the object that I want to save is a sparse numpy matrix, and that I am serializing the data before compressing, which also increases the memory consumption. Since I do not need the python object after it is serialized, would gc.collect() help?
Incremental (de)compression should be done with
zlib.{de,}compressobj()so that memory consumption can be minimized. Additionally, higher compression ratios can be attained for most data by usingbz2instead.