For a recent MVC3 project I ended up storing files in an MSSQL database as byte[] (MVC3 turns this into a varbinary(max) column).
Once files are committed to the database, they will NEVER change.
Now im trying to be real clever and avoid storing the same file multiple times even if users add the same file over and over again.
My first idea for avoiding this was to create a where query to try to find if there is an existing match for the byte[] I’m trying to add. I’m however worried that this is too heavy of a query as files can be around 100Mb in size, and there can be multiple people adding them at the same time.
To try to be even more clever I could use some information about the file to narrow the search. I currently have the following relevant columns in the model.
public byte[] FileData { get; set; }
public String MimeType { get; set; }
public double FileSizeMb { get; set; }
I could search for if there are any files with the same mime type and file size as well as possibly adding a hash of the byte[] too see if i can get a match off those before I try to match the byte[]. That way im only comparing when I have a file with the right size of the right type and the same hash, which should be less heavy..
Are there any better options to accomplish this? How would I best solve this problem?
If it’s any difference I’m using mssql 2012.
You should compute hash of your
FileData(on web server side, of course), and search using hash and file size. If you have a match, you should downloadFileDatato web server and compare (not the other way around).