This problem might be a common one, but since I don’t know the terms associated with it, I couldn’t search for it (unless Google accepted entire paragraphs as search queries).
I have a file – Can be a text file, or an MP3 file or a video clip or even a HUGE mkv file.
I have access to this file and now I have to process it in some way so that I get some kind of a value or unique identifier.. a hash, or something. I store it somewhere. This “hash” has to be small – several byte. It shouldnt be half the file size!
Later on, when I am presented with a file again, I have to verify whether it was the same original file using that value I got in step 1. I will NOT have access to the original file this time. All I have will be that value from step 1.
This algorithm should return true if the second file contains the exact same data – every single bit – as the first file (basically the same file) even if the file name, attributes, location etc have all changed.
Basically I need to know whether I am dealing with the same file, even if it moved, renamed and has all its attributes changed – but when NOT having access to both the files at the same time.
This has to be OS or FileSystem independent.
Is there a way to accomplish this?
What you’re looking for are cryptographic hash algorithms. Read about them:
All robust languages and libraries offer support for calculating hashes.