I have a file stored on disk that can be access across multiple servers in a web farm. This file is updated as necessary based on data changes in the database. I have a database table that stores a row with a URI for this file and some hashes based off of some database tables. If the hashes don’t match their respective tables, then the file need to be regenerated and a new row needs to be inserted.
How do I make it so that only 1 client regenerates this file and inserts a row?
The easiest but worst solution (because of locks) is to:
BEGIN TRANSACTION
SELECT ROW FROM TABLE (lock the table for the remainder of the transaction)
IF ROW IS OUT OF DATE:
REGENERATE FILE
INSERT ROW INTO TABLE
DO SOME STUFF WITH FILE (30s)
COMMIT TRANSACTION
However, if multiple clients execute this code, all of the subsequent clients sit for a long time while the “DO SOME STUFF WITH FILE” processes.
Is there a better way to handle this? Maybe changing the way I process the file before the commit to make it faster? I’ve been stumped on this for a couple days.
The answer depends on the details of file level processing.
If you just swap the database and file operations, you risk corruption of the file or busy waiting (depending on how exactly you open it, and what your code does when a concurrent open is rejected). Busy waiting would definitely be worse than waiting on a database lock from a throughput (or any other) perspective.
If your file processing really takes so long as to be frequently causing queueing of requests, the only solutions are to add more powerful hardware or optimize file level processing.
For example, if the file only reflects the data in the database, you might get away with not updating it at all, and having a background process that periodically regenerates its content based on the data in the database. You might need to add versioning that makes sure that whoever reads the file is not receiving stale data. If the file pointed to by the URL has a new name every time, you might need an error handler that makes sure that
GETrequests are not habitually receiving a404response on new files.