I need to create ZIP archives on demand, using either Python zipfile module or unix command line utilities.
Resources to be zipped are often > 1GB and not necessarily compression-friendly.
How do I efficiently estimate its creation time / size?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Extract a bunch of small parts from the big file. Maybe 64 chunks of 64k each. Randomly selected.
Concatenate the data, compress it, measure the time and the compression ratio. Since you’ve randomly selected parts of the file chances are that you have compressed a representative subset of the data.
Now all you have to do is to estimate the time for the whole file based on the time of your test-data.