Take the tsql query below:
DECLARE @table TABLE(data VARCHAR(20)) INSERT INTO @table VALUES ('not duplicate row') INSERT INTO @table VALUES ('duplicate row') INSERT INTO @table VALUES ('duplicate row') INSERT INTO @table VALUES ('second duplicate row') INSERT INTO @table VALUES ('second duplicate row') SELECT data INTO #duplicates FROM @table GROUP BY data HAVING COUNT(*) > 1 -- delete all rows that are duplicated DELETE FROM @table FROM @table o INNER JOIN #duplicates d ON d.data = o.data -- insert one row for every duplicate set INSERT INTO @table(data) SELECT data FROM #duplicates
I understand what it is doing, but the last part of logic (after –insert one row for every duplicate set), doesn’t make sense. Where we have the set of code for –delete all rows that are duplicated, that gets rid of the duplicates so what’s the part of the last section?
This query was found here
Thanks
First, it deletes all rows that ever had duplicates. That is, all rows, and original also. In the case above, only one row (
'not duplicate row') will remain in the table afterDELETE. All four other rows will be deleted.Then is populates the table with the deleted rows again, but now the duplicates are removed.
This is not the best way to delete duplicates.
The best way is: