I have a process that bulk inserts into a table from a CSV. I now have requirements that some data coming from the CSV will contain ‘updated’ records (data that was imported previously but now have changes).
At this point I have a table full of duplicates. Is it possible to have BULK INSERT update (or even delete before insert) the records based on the PK?
I would rather not add a second step to this process to remove duplicates.
Edit: Instead of a staging table I’m just going to run the a delete query similar to the following
declare @tbl table
(
id int,
ref nvarchar(10)
)
insert into @tbl
values(1, 'AAAA'),
(2, 'BBBB'),
(3, 'CCCC'),
(4, 'AAAA'),
(5, 'BBBB'),
(6, 'AAAA')
delete from @tbl where id in (
select id from
(
select
id,
ref,
RANK() OVER(partition by ref order by id desc) as rnk
from @tbl) d
where rnk > 1
)
select * from @tbl
If it were me, I would load to a staging table and do the dupout from there. I’m not sure if BCP has that functionality, but I would be concerned about any sort of logic being done that I don’t have direct and visible control over.
It would also prevent you from being able to do QC checks based on the data you are loading. With a staging table you can do a PK comparison of some sort to make sure you have the correct number of distinct values.