I receive a daily CSV with duplicate records in it. I tried to import it using BULK INSERT, but my primary key constraint throws me an error because of the duplicates.
To fix this, I am thinking about importing the data into a new table with no primary key constraint, and then using the following code:
INSERT INTO final_table(col1, col2, col3)
SELECT DISTINCT col1, col2, col3
FROM temporary_table
Is this the best way of going about this? Or is there an easier way to do this in SQL Server 2008?
This “new” table is called the staging table. It should have very little limitations… ie. constraints. Once loaded there, you scrub and load into your “final” table.
I think what you’re proposing to do is the simplest. Unless you’re using SSIS and you’re adamant about not using a staging table. I generally like having a staging table around so I can see an exact replica of the file if something were to go wrong. Helps with troubleshooting.