I have a large directory that contains only stuff in CS and Math. It is over 16GB in size. The types are text, png, pdf and chm. I have currently two branches: a branch of my brother’s and mine. The initial files were the same. I need to compare them. I have tried to use Git, but there is a long loading time.
What is the best way to compare two big directories?
[Mixed Solution]
- Do a ‘ls -R > different_files’ in both directories [1]
- ‘sdiff <(echo file1 | md5deep) <(echo file2 | md5deep)’ [2]
What do you think? Any drawbacks?
[1] thanks to Paul Tomblin [2] great thanks to all repliers!
How to compare 2 folders without pre-existing commands/products:
Simply create a program that scans each directory and creates a file hash of each file. It outputs a file with each relative file path and the file hash.
Run this program on both folders.
Then you simply compare the 2 output files to see if they are the same. To compare those 2 files you just load them into a string and do a string compare.
The hashing algorithm you use doesn’t matter. You can use MD5, SHA, CRC, … You could also use the file size in the output files to help reduce the chance of collisions.
How to compare 2 folders with pre-existing commands/products:
Now if you just want a program that does it, use diff -r or windiff for windows based systems.