I am develop a web app (using Java EE 6 with GF 3.1) that allow user to upload a pdf file. Since this is a closed small community, there are lots of chances that the uploaded file is already in the system. I cant just check name for duplication since it is clearly not enough. I was thinking about hash the entire file and store the entry inside a database. Is this feasible and how to achieve this? If not, then what is a better way.
Share
Consider using checksum.
This is from http://www.exampledepot.com/egs/java.util.zip/CalculateChecksum.html
Edit:
Be aware that checksums cannot absolutely tell you if two files are different, but they’re very useful.
If two files are the same, they will have the same checksum.
So if the checksums differ, you know absolutely that the files differ.
But two different files can also sometimes have the same checksum.
So the way to use this is to calculate the checksums first – if they differ, the files are different. If they’re the same, you’ll have to do a byte-by-byte comparison. That’s slower, of course, but it won’t happen often.
Note also all of this applies to hashcodes as well.