Given a table that is acting as a queue, how can I best configure the table/queries so that multiple clients process from the queue concurrently?
For example, the table below indicates a command that a worker must process. When the worker is done, it will set the processed value to true.
| ID | COMMAND | PROCESSED |
| 1 | ... | true |
| 2 | ... | false |
| 3 | ... | false |
The clients might obtain one command to work on like so:
select top 1 COMMAND
from EXAMPLE_TABLE
with (UPDLOCK, ROWLOCK)
where PROCESSED=false;
However, if there are multiple workers, each tries to get the row with ID=2. Only the first will get the pessimistic lock, the rest will wait. Then one of them will get row 3, etc.
What query/configuration would allow each worker client to get a different row each and work on them concurrently?
EDIT:
Several answers suggest variations on using the table itself to record an in-process state. I thought that this would not be possible within a single transaction. (i.e., what’s the point of updating the state if no other worker will see it until the txn is committed?) Perhaps the suggestion is:
# start transaction
update to 'processing'
# end transaction
# start transaction
process the command
update to 'processed'
# end transaction
Is this the way people usually approach this problem? It seems to me that the problem would be better handled by the DB, if possible.
I recommend you go over Using tables as Queues.
Properly implemented queues can handle thousands of concurrent users and service as high as 1/2 Million enqueue/dequeue operations per minute. Until SQL Server 2005 the solution was cumbersome and involved a mixing a
SELECTand anUPDATEin a single transaction and give just the right mix of lock hints, as in the article linked by gbn. Luckly since SQL Server 2005 with the advent of the OUTPUT clause, a much more elegant solution is available, and now MSDN recommends using the OUTPUT clause:Basically there are 3 parts of the puzzle you need to get right in order for this to work in a highly concurrent manner:
OUTPUTclause comes into play:PROCESSEDcolumn. If theIDwas used a primary key, then move it as the second column in the clustered key. The debate whether to keep a non-clustered key on theIDcolumn is open, but I strongly favor not having any secondary non-clustered indexes over queues:The combination of atomic dequeue, READPAST hint at searching elements to dequeue and leftmost key on the clustered index based on the processing bit ensure a very high throughput under a highly concurrent load.