There is a post here that talks about this as being a bad idea and that instead of using a unique constraint I have to always INSERT only data not in the database. Now my table structure is as follows:
CREATE TABLE [dbo].[TEST](
[ColA] [varchar](255) NULL,
[ColB] [datetime] NULL,
[ColC] [datetime] NULL,
[ColD] [int] NOT NULL,
) ON [PRIMARY]
Only unique rows are expected so for me it makes sense in setting up a unique constraint as:
ALTER TABLE dbo.TEST
ADD CONSTRAINT uniqueRows UNIQUE (ColA, ColB, ColC, ColD) WITH (IGNORE_DUP_KEY = ON)
When new information is being consumed, due to many reasons, there could be duplicate data being processed because my code does not maintain state and hence at least to me it makes sense to ignore any duplicate data.
However, in the linked post, @PhilipKelley in his answer, says that this is a bad idea and that one should make checks something along the lines of:
INSERT INTO X
VALUES(Y,Z)
WHERE Y NOT IN (SELECT Y FROM X)
which in my case translates to:
INSERT INTO dbo.TEST
VALUES(ValA,ValB,ValC,ValD)
WHERE (Some complicated check)
Or perhaps make some fancy primary key. The response here hints that if I know this is indeed being used as a feature, I can go ahead and use the IGNORE_DUP_KEY option and things will be ok. What is the suggested path in my case?
Personally I think you should do three things:
For single-row inserts, this can be as simple as:
It gets a little more complicated if any or all of these columns are nullable.
For multi-row inserts, you can handle this in a variety of ways. You can have the entire batch fail if there is a non-unique value (either in the batch alone or conflicting with the table), or you can allow just the successful rows into the table. Situation A:
Scenario B, where you want to keep the good rows and ignore duplicates: