We have a workorder table. A server agent jobs grabs 100 entries from this table in a cursor and do some work. To parallelize this there are 10 server agent jobs, which call the following procedure (each with its own @process_id):
CREATE PROCEDURE sp_do_workorder @process_id INT
AS
BEGIN TRY
DECLARE @wo_id NCHAR(40),
@wo_action NVARCHAR(100),
@created_at DATETIME,
@source_proc_name NVARCHAR(100),
UPDATE procedure_ctrl SET [status]='running' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='idle'
WHILE 1=1
BEGIN
IF NOT EXISTS (SELECT * FROM procedure_ctrl WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running') BREAK
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE workorder SET hash=CAST(@process_id AS NVARCHAR(100))
FROM workorder x
INNER JOIN (
SELECT TOP 100 id FROM workorder WHERE hash='' AND workorder_step=0 ORDER BY created_at ASC
) y ON x.id=y.id
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE wo_cur CURSOR FAST_FORWARD FOR SELECT id,action,created_at,optin_source FROM workorder WHERE hash=CAST(@process_id AS NVARCHAR(100)) AND workorder_step=0 ORDER BY created_at ASC
OPEN wo_cur
FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
WHILE @@FETCH_STATUS=0
BEGIN
EXEC sp_basisprozess @wo_id,@wo_action,@created_at,@source_proc_name,@process_id
FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
END
CLOSE wo_cur
DEALLOCATE wo_cur
WAITFOR DELAY '00:00:01'
END
UPDATE procedure_ctrl SET [status]='idle' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running'
END TRY
BEGIN CATCH
EXEC dbo.sp_listerror
DECLARE @error NVARCHAR(4000)
SET @error='[sp_do_workorder]_'+CAST(@process_id AS NVARCHAR(100))+': critical problem'
RAISERROR(@error, 12, 1)
END CATCH
We get a deadlock really often for most of these 10 agent jobs. Has anyone a hint why this is so? To prevent side effects we use serialize transaction isolation level, so only one agent job can grab one workorder entry. Without setting transaction isolation leven the deadlock are gone, but then it often occurs, that two (or more) agent jobs grab the same workorder entry.
I haven’t tried to work out specifically why the deadlock would occur in your case but it seems as though you are effectively using a table as a queue in which case see this linked article for an approach that uses the
OUTPUTclause and locking hints to maximise concurrency without deadlocks.