say I have input data like so:
firstName | lastName | Country
Bob | Smith | UK
Jane | Doe | France
Hank | Scorpio | UK
and the target tables are:
People
ID | firstName | lastName | CountryId
Country
ID | CountryName
0 | France
Now in SSIS data flow task I read the input, use a lookup to search the Country table for a matching CountryName, if it exists no problem, return the ID and carry on, but if it does not exists I want to use an OLEDBCommand to create the record in the Country table, get the ID and carry on.
However, what is happening is that UK is getting passed to the OLEDBCommand twice.
How should I be handling this scenario? is there some way of forcing the lookup to check one record at a time as it seem to be checking a batch before adding the missing records. I have tried changing the cache options between full and none to no affect.
You could go around to set buffer size to 1 (one row per buffer), and no cache on the lookup, but even so you risk (because of the parallelisation of ssis) to have two rows going at it at the “same time”.
What you can do however is to think of it differently. Perhaps you don’t need to do it all in one step.
First, load all users and check against country. For those that doesn’t have, gather up in an aggregate (based on country), and insert those into your Country table.
Then, you can load all your users using a normal lookup – because the Country table has been prefilled.