I have a table with 10 million records with a nonclustered index key on one column and I am trying to dedupe the table. I tried the inserts with select where either using a left join or where not exists; but each time I get the error with violation of key. Here are the queries I used;
insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
where not exists(select 1 from temp as t where t.profile=s.profile)
This just produces the violation of key error. I tried using the following:
insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
left outer join temp t on t.profile=s.profile
where t.profile is null
I ended using a batch insert since the log file was growing too big but still get the violation of primary key error even on only 1000 records.
Destination Table :IX_Temp - profileUrl(ASC)--> unique key (non clustered)
Source Table: IX_PURL - profileUrl(ASC) ---> index (non clustered, not unique
I imagine that
distinctisn’t working as you expect here as the time portion will be slightly different.A different approach would be to use
group byand take the earliestdateaddedto remove any duplicates.Maybe something like this: