I’m using SSIS and BIDS to process a text file who contains lots (millions) of records. I decided to use the Bulk Insert Task and it worked great but then the destination table needed an additional column with a default value on the insert operation and the Bulk Insert Task stopped working. After that, I decided to use a Derived Column with the defaul value and an OleDB Destination to insert the bulk data. It solved my last problem but generated a new one: If there is an error when inserting the data in the OleDB Destination, then it executes a full rollback and no row was added on my table, but when I used the Bulk Insert Task, there were rows based in the BatchSize configuration. Let me explain it with a sample:
- I use a text file with 5000 lines. The file contained a duplicate id (intentionally) between the rows 3000 and 4000.
- Before starting the DTS, the destination table was totally empty.
- Using Bulk Insert Task, after the error raised (and the DTS stopped), the destination table had 3000 rows. I set the BatchSize attribute to 1000.
- Using OleDB Destination, after the error raised, the destination table had 0 rows! I set the Rows per batch attribute to 1000 and the Maximum insert commit size to its max value: 2147483647. I tried changing last one to 0, no effect.
Is this the normal behavior of OleDB Destination? Can someone provide me a guide about working with these tasks? Should I forget to use these tasks and use the Bulk Insert from T-SQL?
As a side note, I also tried following the instructions for KEEPNULLS in Keep Nulls or UseDefault Values During Bulk Import (SQL Server) to not use the OleDB Destination task, but it didn’t work (maybe is just me).
EDIT: Additional info about the problem.
Table structure (sample)
Table T
id int, name varchar(50), processed int default 0
CSV File (sample)
1, hello
2, world
There is no rolling back on Bulk Inserts, that’s why they are fast.
Take a look at using format files:
http://msdn.microsoft.com/en-us/library/ms179250.aspx