What’s the most efficient way to identify a binary file? I would like to

Question

0

Asked: May 16, 20262026-05-16T05:58:56+00:00 2026-05-16T05:58:56+00:00

What’s the most efficient way to identify a binary file? I would like to

0

What’s the most efficient way to identify a binary file? I would like to extract some kind of signature from a binary file and use it to compare it with others.

The brute-force approach would be to use the whole file as a signature, which would take too long and too much memory. I’m looking for a smarter approach to this problem, and I’m willing to sacrifice a little accuracy (but not too much, ey) for performance.

(while Java code-examples are preferred, language-agnostic answers are encouraged)

Edit: Scanning the whole file to create a hash has the disadvantage that the bigger the file, the longer it takes. Since the hash wouldn’t be unique anyway, I was wondering if there was a more efficient approach (ie: a hash from an evenly distributed sampling of bytes).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T05:58:57+00:00

An approach I found effective for this sort of thing was to calculate two SHA-1 hashes. One for the first block in a file (I arbitrarily picked 512 bytes as a block size) and one for the whole file. I then stored the two hashes along with a file size. When I needed to identify a file I would first compare the file length. If the lengths matched then I would compare the hash of the first block and if that matched I compared the hash of the entire file. The first two tests quickly weeded out a lot of non-matching files.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What’s the most efficient way to identify a binary file? I would like to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply