Usually, in order to find how two binary files are different, I use diff and hexdump tools. But in some situations if two large binary files of the same size are given, I would like to see only their quantitative differences, like number of regions of differences, cumulative difference.
Example: 2 Files A and B. They have 2 diff regions, and their cumulative difference is
6c-a3 + 6c-11 + 6f-6e + 20-22.
File A = 48 65 6c 6c 6f 2c 20 57
File B = 48 65 a3 11 6e 2c 22 57
|--------| |--|
reg 1 reg 2
How can I get such information using standard GNU tools and Bash or should I better use a simple Python script? Other statistics about how 2 files are different can also be useful, but I don’t know what else and how can be measured? Entropy difference? Variance difference?
For everything but the regions thing you can use numpy. Something like this (untested):
I couldn’t find a numpy function for computing the regions, but just write your own using
a != bas input, it shouldn’t be hard. See this question for inspiration.