I’m developing a program that needs to load and save data in external files, I have been searching for options and I have chosen to save the data in a binary file.
As I don’t want that someone could edit the file easily, I thought about writing in the first line of the file, its md5 sum. In this case, if some data of the file is changed, the sum won’t match the one of the first line.
The problem I find then is that if I calculate the MD5, and after that I write the info inside the file, it’s obvious that the sum will be different, so, how could I sort this?
If you sugest me a better option than the sum, it will be equally accepted.
Thanks in advance.
What is your threat model?
If you just want to protect against casual fiddling, md5 the main data of the file, then write the md5 sum to the end. To validate, strip off the md5 sum, then md5 only the original file.
If you want to protect against malicious and skilled cracking, you’re out of luck; any validation algorithm you use can be replicated, particularly if they have access to the program itself. Even a cryptographic signature could fail if the attacker extracts the key from the program binary.
If it’s a big deal, a unix solution is to run as
setuidorsetgidto a different user and write to a directory which users cannot modify. I’m not sure what a good general Java solution is, but the point remains: users shouldn’t be able to modify your data because they were prevented from doing so, not because they were detected trying to.