We are currently planning a website on which people can upload movies. When looking at YouTube you notice that some movies are uploaded twice or more times (by different users). To scale our application we’re thinking about the following idea:
- User selects movie file to be uploaded
- A JavaScript will get the SHA256 hash from the file (it’s more accurate then the MD5 hash) before it get’s uploaded
- The website will check if the hash already exists
- If the hash doesn’t exist, the file will be uploaded
- If the hash does exist a message will be prompted or a reference to the already existing version on the server will be created. This without the video being uploaded.
Q: How do we analyze a file with JavaScript in order to get the SHA256 hash, and is SHA256 good enough or should we consider SHA512 (or another algorithm)?
Use the HTML5 File API to read the file: http://www.html5rocks.com/en/tutorials/file/dndfiles. Here is a JS code for calculating SHA-256: http://www.webtoolkit.info/javascript-sha256.html
I must add that I never tried this, but it seems to be possible. Alxandr is right, this would take very long for large videos, but you may try to use the WebWorker API in order not to freeze the browser: http://html5rocks.com/en/tutorials/workers/basics