I’ve a table [File] that has the following schema
CREATE TABLE [dbo].[File] ( [FileID] [int] IDENTITY(1,1) NOT NULL, [Name] [varchar](256) NOT NULL, CONSTRAINT [PK_File] PRIMARY KEY CLUSTERED ( [FileID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
The idea is that the FileID is used as the key for the table and the Name is the fully qualified path that represents a file.
What I’ve been trying to do is create a Stored Procedure that will check to see if the Name is already in use if so then use that record else create a new record.
But when I stress test the code with many threads executing the stored procedure at once I get different errors.
This version of the code will create a deadlock and throw a deadlock exception on the client.
CREATE PROCEDURE [dbo].[File_Create] @Name varchar(256) AS SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRANSACTION xact_File_Create SET XACT_ABORT ON SET NOCOUNT ON DECLARE @FileID int SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name IF @@ROWCOUNT=0 BEGIN INSERT INTO [dbo].[File]([Name]) VALUES (@Name) SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name END SELECT * FROM [dbo].[File] WHERE [FileID] = @FileID COMMIT TRANSACTION xact_File_Create GO
This version of the code I end up getting rows with the same data in the Name column.
CREATE PROCEDURE [dbo].[File_Create] @Name varchar(256) AS BEGIN TRANSACTION xact_File_Create SET NOCOUNT ON DECLARE @FileID int SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name IF @@ROWCOUNT=0 BEGIN INSERT INTO [dbo].[File]([Name]) VALUES (@Name) SELECT @FileID = [FileID] FROM [dbo].[File] WHERE [Name] = @Name END SELECT * FROM [dbo].[File] WHERE [FileID] = @FileID COMMIT TRANSACTION xact_File_Create GO
I’m wondering what the right way to do this type of action is? In general this is a pattern I’d like to use where the column data is unique in either a single column or multiple columns and another column is used as the key.
Thanks
If you are searching heavily on the Name field, you will probably want it indexed (as unique, and maybe even clustered if this is the primary search field). As you don’t use the @FileID from the first select, I would just select count(*) from file where Name = @Name and see if it is greater than zero (this will prevent SQL from retaining any locks on the table from the search phase, as no columns are selected).
You are on the right course with the SERIALIZABLE level, as your action will impact subsequent queries success or failure with the Name being present. The reason the version without that set causes duplicates is that two selects ran concurrently and found there was no record, so both went ahead with the inserts (which creates the duplicate).
The deadlock with the prior version is most likely due to the lack of an index making the search process take a long time. When you load the server down in a SERIALIZABLE transaction, everything else will have to wait for the operation to complete. The index should make the operation fast, but only testing will indicate if it is fast enough. Note that you can respond to the failed transaction by resubmitting: in real world situations hopefully the load will be transient.
EDIT: By making your table indexed, but not using SERIALIZABLE, you end up with three cases:
I would expect this last case to be truly exceptional, so using an exception to capture this very rare case would be preferable to engaging SERIALIZABLE, which has serious performance consequences.
If you do really have an expectation that it will be common to have posts within milliseconds of one another of the same new name, then use a SERIALIZABLE transaction in conjunction with the index. It will be slower in the general case, but faster when these posts are found.