I am currently getting an md5 checksum as follows:
>>> import hashlib
>>> f = open(file)
>>> m = hashlib.md5()
>>> m.update(f.read())
>>> checksum = m.hedxigest()
I need to return the checksum of a large video file, that will take several minutes to generate. How would I implement a percentage counter, such that it prints the percentage complete for each percentage while it is running. Something like:
>>> checksum = m.hedxigest()
1% done...
2% done...
etc.
You can call the
update()method repeatedly and feed the file in chunks to it. Thus, you can show the progress yourself.When I try
print digest_with_progress('/bin/bash', 1024)this is what I get:Here are the actual details of this file.
Note that, you would not get the expected output if you make
chunk_sizetoo large. For example if we read in 100 KB chunks instead of 1 KB chunks for/bin/bash, this is what you see.The limitation of this approach is that we calculate the progress only after we have read a chunk into the digest. So, if the chunk size is too large, the percentage-difference in progress would be more than 1% every time you read a chunk and update the digest. A bigger chunk size would get the job done a bit quicker. So, you might want to relax the condition of printing percentage complete for each percentage in favour of efficiency.