I would like to create a custom identity based off an custom identity. Or perhaps something similar to an identity that functions like an auto-incrementing key.
For example if I have an primary key for a drawing I would like its revision to be based off the the drawing number.
Example
DRAWING ID | REV | INFO ------+-------+------ 1 | 0 | "Draw1" 2 | 0 | "Draw2" 2 | 1 | "Draw2Edit" 2 | 2 | "Draw2MoreEdit" 3 | 0 | "Draw3" 4 | 0 | "Draw4"
If I was to insert a few more records into my table such a:
INSERT INTO DRAWING (INFO) VALUES ("Draw5")
INSERT INTO DRAWING (ID,INFO) VALUES (3,"Draw3Edit")
My table would like:
DRAWING ID | REV | INFO ------+-------+------ 1 | 0 | "Draw1" 2 | 0 | "Draw2" 2 | 1 | "Draw2Edit" 2 | 2 | "Draw2MoreEdit" 3 | 0 | "Draw3" 3 | 1 | "Draw3Edit" --NEW ROW 4 | 0 | "Draw4" 5 | 0 | "Draw5" --NEW ROW
T-SQL
CREATE TABLE DRAWING
(
ID INT,
REV INT,
INFO VARCHAR(50),
PRIMARY KEY (ID,REV)
);
CREATE TABLE CURRENT_DRAWING
(
ID INT IDENTITY (1,1),
DRAWING_ID INT,
DRAWING_REV INT,
PRIMARY KEY (ID),
FOREIGN KEY (DRAWING_ID,DRAWING_REV) REFERENCES DRAWING (ID,REV)
ON UPDATE CASCADE
ON DELETE CASCADE
);
I am using SQL Server Management Studio 2005 and working on a SQL Server 2000 Database.
I will also accept possible alternatives. The primary goal is for the ID to auto-increment for new drawings. The ID will remain the same and the REV will increment on new drawing revisions.
Update:
I think I have it close to what I want:
DROP TABLE DRAW
GO
CREATE TABLE DRAW
(
ID INT DEFAULT(0),
REV INT DEFAULT(-1),
INFO VARCHAR(10),
PRIMARY KEY(ID, REV)
)
GO
CREATE TRIGGER TRIG_DRAW ON DRAW
FOR INSERT
AS
BEGIN
DECLARE @newId INT,
@newRev INT,
@insId INT,
@insRev INT
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
SELECT @insId = ID FROM inserted
SELECT @insRev = REV FROM inserted
PRINT 'BEGIN TRIG'
PRINT @insId
PRINT @insRev
PRINT @newId
PRINT @newRev
--IF ID=0 THEN IT IS A NEW ID
IF @insId <=0
BEGIN
--NEW DRAWING ID=MAX+1 AND REV=0
SELECT @newId = COALESCE(MAX(ID), 0) + 1 FROM DRAW
SELECT @newRev = 0
END
ELSE
--ELSE IT IS A NEW REV
BEGIN
--CHECK TO ENSURE ID EXISTS
IF EXISTS(SELECT * FROM DRAW WHERE ID=@insId AND REV=0)
BEGIN
PRINT 'EXISTS'
SELECT @newId = @insId
SELECT @newRev = MAX(REV) + 1 FROM DRAW WHERE ID=@insID
END
ELSE
--ID DOES NOT EXIST THEREFORE NO REVISION
BEGIN
RAISERROR 50000 'ID DOES NOT EXIST.'
ROLLBACK TRANSACTION
GOTO END_TRIG
END
END
PRINT 'END TRIG'
PRINT @insId
PRINT @insRev
PRINT @newId
PRINT @newRev
SELECT * FROM DRAW
UPDATE DRAW SET ID=@newId, REV=@newRev WHERE ID=@insId
COMMIT TRANSACTION
END_TRIG:
END
GO
INSERT INTO DRAW (INFO) VALUES ('DRAW1')
INSERT INTO DRAW (INFO) VALUES ('DRAW2')
INSERT INTO DRAW (ID,INFO) VALUES (2,'DRAW2EDIT1') --PROBLEM HERE
INSERT INTO DRAW (ID,INFO) VALUES (2,'DRAW2EDIT2')
INSERT INTO DRAW (INFO) VALUES ('DRAW3')
INSERT INTO DRAW (INFO) VALUES ('DRAW4')
GO
--SHOULD THROW
INSERT INTO DRAW (ID,INFO) VALUES (9,'DRAW9')
GO
SELECT * FROM DRAW
GO
However, I keep getting Violation of PRIMARY KEY constraint.
I have put debug statements and it seems unlikely that I am violating my primary key:
BEGIN TRIG 0 -1 END TRIG 0 -1 1 0 (1 row(s) affected) (1 row(s) affected) (1 row(s) affected) BEGIN TRIG 0 -1 END TRIG 0 -1 2 0 (2 row(s) affected) (1 row(s) affected) (1 row(s) affected) BEGIN TRIG 2 -1 EXISTS END TRIG 2 -1 2 1 (3 row(s) affected) Msg 2627, Level 14, State 1, Procedure TRIG_DRAW, Line 58 Violation of PRIMARY KEY constraint 'PK__DRAW__56D3D912'. Cannot insert duplicate key in object 'DRAW'. The statement has been terminated.
It prints
ID | REV | INFO ----+--------+------------ 1 | 0 | DRAW1 2 | -1 | DRAW2EDIT1 --This row is being updated to 2 1 2 | 0 | DRAW2
Just before it fails and the row 2 -1 is being updated to 2 1. It should not violate my primary key.
I would actually recommend an alternative data design. This kind of key-and-sequence pattern is very difficult to implement properly in a relational database, and the drawbacks often outweigh the benefits.
You have quite a few options, but the simplest ones start with splitting the table in two:
This has the benefit of representing the data accurately and working correctly with no additional effort on your part. Simply add a row to the
DRAWING_REVISIONtable when you want to add a new revision to a drawing. Because the primary keys use theIDENTITYspecification, you don’t have to do the work of finding the nextID.The Obvious Solution and Its Shortcoming
If you need a human-readable revision number, though, rather than for-your-server’s-eyes-only
ID, that can be done in two ways. They both start by addingREV INTto the data definition forDRAWING_REVISION, along with aCONSTRAINT UK_DRAWING_REVISION_DRAWING_ID_REV UNIQUE (DRAWING_ID, REV). The trick then, of course, is to find out the next revision number for a given drawing.If you expect to only every have a tiny number of concurrent users, you can simply
SELECT MAX(REV) + 1 FROM DRAWING_REVISION WHERE DRAWING_ID = @DRAWING_ID, either in your application code, or in anINSTEAD OF INSERTtrigger. With high concurrency or bad luck, though, users could end up blocking one another, because they could try to insert the same combination ofDRAWING_IDandREVintoDRAWING_REVISION.Some Background
There’s really only one solution to this problem, though explaining why there’s only one solution requires a little bit of background information. Consider the following code:
Of course, the output would differ on subsequent executions. Behind the scenes, SQL server is doling out
IDENTITYvalues and incrementing a counter. If you never actually commit the value, the server makes no attempt to “back-fill” holes in the sequence – the values are provided on a forward-only basis.This is a feature, not a bug.
IDENTITYcolumns are designed to be ordered and unique, but not necessary tightly packed. The only way to guarantee tight-packing is to serialize all incoming requests, making sure that each one either completes or terminates before the next one begins; otherwise, the server could try to back-fill anIDENTITYvalue that was issued a half hour ago, only to have a long-running transaction (i.e., the initial recipient of thatIDENTITYvalue) commit a row with a duplicate primary key.(It’s worth pointing out that when I say “transaction,” that doesn’t need to refer to a TSQL
TRANSACTION, though I would recommend their use. It could be absolutely any procedure on the application or SQL server side that might take any amount of time, even if that time is only the time it takes toSELECTthe next revision number and immediately thereafterINSERTthe newDRAWING_REVISION.)This attempt to back-fill values is just serialization in disguise, since, in a situation with two simultaneous
INSERTrequests, it punishes the second request to commit. This forces the last-comer to try again (possibly several times, until it just happens that there is no conflict). There is one successful submission at a time: serialization, though without the benefit of a queue.The
SELECT MAX(REV) + 1approach has the same disadvantage. Naturally, aMAXapproach doesn’t make any attempt to back-fill values, but it does force every concurrent request to fight over the same revision number, with the same results.Why is this bad? Database systems are designed for parallelism and currency: this ability is one of the primary advantages of a managed database over a flat-file format.
Faking It Right
So, after all that long-winded exposition, what can you do to solve the problem? You could cross your fingers and hope that your never see many concurrent users, but why would you wish against the wide-spread use of your own application? You don’t want success to be your downfall, after all.
The solution is to do what SQL Server does with
IDENTITYcolumns: dole them out, and then throw them a way. You could use something like the following SQL code, or use equivalent application code:The @@ROWCOUNT check is very important – this procedure needs to be non-transactional, because you don’t want to hide conflicts from concurrent requests; you want to resolve them. The only way to make sure that your update definitely went through is to check whether any rows were updated.
Of course, you might have guessed that this approach isn’t fool-proof. The only way to “resolve” conflicts is to try a few times before giving up. No home-brew solution will every be quite as good as the one hard-coded into the database server software. But it can get pretty close!
The stored procedure doesn’t eliminate conflicts, but it does massively shrink the time span over which a conflict can occur. Rather than “reserving” a revision number for a pending
INSERTtransaction, you receive the latest revision number and update the static counter as quickly as possible, getting out of the way for the next call toGET_REVISION_NUMBER. (This is serialized, to be sure, but only for the very tiny portion of the procedure that needs to be executed in a serial fashion; unlike in many other methods, the rest of the algorithm is free to execute in parallel.)My team used a solution similar to the one outlined above, and we found that the incidence of blocking conflicts went down by several orders of magnitude. We were able to submit thousands of back-to-back requests from a half dozen machines on the local network before one of them ended up stuck.
The stuck machine got trapped in a loop, requesting a new number from the SQL server, always getting a null result. It couldn’t get a word in edgewise, so to speak. This is similar to the conflict behavior in the
SELECT MAXcase, but much, much rarer. You trade the guaranteed consecutive numbering of theSELECT MAXapproach (and any related approach) for a thousand-fold increase in scalability. This trade-off is more or less fundamental: there is, to my knowledge, no guaranteed-consecutive, non-serialized solution.The Takeaway
Of course, all of this goop is predicated upon the need for a localized, semi-consecutive number. If you can live with less-user-friendly revision numbers, you could simply expose
DRAWING_REVISION.ID. (Exposing surrogates keys is unsavory in its own way, though, if you ask me.)The real takeaway here is that custom identity columns are harder to implement than it may first appear, and any application that may one day require scalability must be very careful about how it fetches new custom identity values.