I’m extracting from a pipe delimited file and inserting into a SQL Server 2008 R2 database table. One of my integer columns does not contain the correct value by the time it makes it to the table.
I can add a dataviewer to the data flow immediately after my first object (the flat file source) and compare the data side by side with the source file opened in Notepad. My string columns are all OK but these unique seven digit integers are replaced by one of just three values (but there 16K unique rows in the original file). The new values look like those they are replacing, same format and range, but they don’t appear in the source file. They actually look as if they’ve been cached somewhere.
Some more information: The external column in the source is a 50 char string, the output column is a 4 byte int. The connection string for the file source is set by an expression based upon a variable set by an earlier script that looks for candidate files in an import directory. There are no other tasks that transform or otherwise modify data either before or after; this package is purely an extract process for another process to deal with the data. The values that are being substituted in do not appear in the XML of the package file (I searched it in case we’d left some old piece of code that was messing with data).
I can recreate the tasks and everything seems to work but I don’t see any difference in properties that would explain this, and then I’d be worried that it will break again. I’d really like to understand what is going wrong here.
Any ideas what could ‘corrupt’ data like this?
This sounds like it could be a Code Page issue. I would suggest 2 options