i am working on a online file management project.In which we are storing references on the database(sql server) and files data on the on file system;.In which we are facing a problem of coordination between file system and database while we are uploading a file and also in case of deleting a file that first we create a reference in the data base or store files on file system;;the problem is that if create a reference in the database first and then storing a file on file system.bur while storing files on the file system any type of error occur.then reference for that file is created in the database but no file data on the file system;; please give me some solution how to deal with such situation;;i am badly in need of it;;
and reason for that?
i am working on a online file management project.In which we are storing references
Share
This actually a little easier than you think it is.
First, you need to decide the “single source of truth”.
That is, either the file system or the DB is correct at any given point in time, which one is it?
The reason for this is that it makes it easier to resolve conflicts.
You should assume that the database is your Source, and that the file system is a shadow of the database. This seems counter intuitive, since, how can an entry exist in the DB if it’s not in the file system. Obviously it can’t. But, basically, if the file isn’t in the DB, then “it doesn’t exist” anyway. So, the file system reflects the state of the DB, not the other way around.
Given these assumptions, you end up with these conflict resolution rules.
For any given file:
When uploading files, there’s exists a grey area — i.e. when a file is uploaded but not yet acknowledged by the DB.
The way to solve this you need to upload the file in a staging mode.
An easy way to do this is to upload the file to a different directory, but on the same physical file system, or to upload it to the final place using a temporary file name. Either way, the file is easily identifiable as being “in process” by it’s name or location.
You want to have this file “staged” on the same file system for two reasons. One, disk space. If the disk doesn’t fill up when you upload, then you KNOW it’s going to fit in its final resting place (it’s already “reserved” the space). Two, when you finally place the file, that operation must be atomic. File rename operations on the same filesystem, are atomic on modern filesystems. Basically, you can’t have the file “half way renamed”, even if it inevitably “overwrites” an existing file (the original is deleted during the rename operation).
Once staged, your operation becomes:
If the rename file action fails, you abort and roll back the DB transaction, thus the entry. If the rename succeeds, and the DB fails? Then you have State #4, listed above. Retry the upload until it succeeds.
To delete a file, do this:
If the DB delete fails, you don’t delete the file. If the DB delete succeeds, and the file deletion fails, then we’re back to State #4.
Finally, you have a reaper process that regularly (daily, weekly, whatever) compares the DB to the file system, deleting any files that are not in the database. Since the DB is the “Single source of truth”, the two stores will eventually be in sync.
If a file goes missing that has a DB record, then you have “data corruption”. Don’t do that. It’s a bug, or someone is walking over your file system.
The retry characteristics of the upload process and the fast fail of the delete process gives you a pseudo two phase commit process that easy to check what’s right and wrong, and easy to correct to the proper state.