I need to compare two directory structures with around one billion files each (directory deepness up to 20 levels)
I found usual diff -r /location/one /location/two slow.
Is there any implementation of multithreading diff? Or is it doable via combining shell and diff together? If so, how?
Your disk is gonna be the bottleneck.
Unless you are working on tmpfs, you will probably only loose speed. That said:
should do a pretty decent job of comparing trees (in this case
.to/tmp/othertree).It has a flaw right now, in that it won’t detect toplevel directories in
otherthreethat don’t exist in.. I leave that as an exercise for the reader – though you could easily repeat the comparison in reverseThe argument
-P4to xargs specifies that you want at most 4 concurrent processes.Also have look at the
xjobsutitlity which does a better job at separating the output. I think with GNU xargs (like shown) you cannot drop the-qoption because it will intermix the diffs (?).