Possible Duplicates:
Finding duplicate files and removing them.
In Python, is there a concise way of comparing whether the contents of two text files are the same?
What is the easiest way to see if two files are the same content-wise in Python.
One thing I can do is md5 each file and compare. Is there a better way?
Yes, I think hashing the file would be the best way if you have to compare several files and store hashes for later comparison. As hash can clash, a byte-by-byte comparison may be done depending on the use case.
Generally byte-by-byte comparison would be sufficient and efficient, which
filecmpmodule already does + other things too.See http://docs.python.org/library/filecmp.html
e.g.
Note that by default,
filecmpdoes not compare the contents of the files, to do so, add a third parametershallow=False.Speed consideration:
Usually if only two files have to be compared, hashing them and comparing them would be slower instead of simple byte-by-byte comparison if done efficiently. e.g. code below tries to time hash vs byte-by-byte
Disclaimer: this is not the best way of timing or comparing two algo. and there is need for improvements but it does give rough idea. If you think it should be improved do tell me I will change it.
and the output is