I have a priority queue table that can be simplistically represented as:
CREATE TABLE test (
id int PRIMARY KEY,
priority int,
status int
)
where priority is the priority (lowest first) and status is the item state (0 means OK to retrieve). The task is to retrieve the element with highest priority and reset its status, so the code goes like this (pseudocode):
try = 0;
while(try++ < max_tries) {
result = query("SELECT id FROM test WHERE status=0 ORDER BY priority LIMIT 0,1");
if(result not empty) {
/* found a row, try to update */
result2 = query("UPDATE test SET status=1 WHERE id={result[id]} AND status=0");
if(affected_rows(result2) > 0) {
/* if update worked fine, we can use this ID */
return result[id];
}
/* otherwise try again */
}
return NULL;
This code should run (with appropriate modifications, like different LIMIT syntax) on any sql database (current requirement is Mysql, Oracle, SQL Server and DB2, but may be more). As far as I know, all of required DBs support “affected rows” API for updates. Any potential problems or pitfalls with this approach? Should SELECT … FOR UPDATE be used in above statement and if yes, why?
The SELECT / UPDATE combination listed seems to address the concrrency issue with the UPDATE’s WHERE clause. I would however lean toward a stored procedure if that were acceptable: depending on the sql flavour, the stored proc could get rid of the concurrency problem with an atomic “UPDATE … RETURNING id INTO… “.
Regarding the states, if they are something like:
then the state acts as a lock. There would be a concern if the process dies after the state changes from 0 to 1, but before the job was actually processed: the record can remain locked and unprocessed until a cleanup job runs. Another approach is to use a separate timestamp column for the lock. Then the query includes a filter “WHERE now() – timeout > NVL(lockTimestamp, beginning of time)” and locking is accomplished by setting the lockTimestamp to now() rather than relying on setting the status=1. This way the lock can auto-expire after a reasonable wait (timeout) and the item becomes available again to be picked up by the next processor. No clean-up job required.
Any chance of starvation where a low priority job might not get picked up?
Does an additional ordering need to be added? If the items are all of the same priority, does it matter which are processed first (fifo, lifo)?
I guess it should be noted that if there are multiple instances of this fetch routine running, then multiple queue items may be processed concurrently and the order of processing is not necessarily sequential (process1 gets item1, process2 gets item2, process2 completes item2, process1 completes item1, …), so hopefully that is okay, otherwise the locking needs to check that nothing else is in process.
I also don’t particularly see a need for a SELECT … FOR UPDATE.