I’m currently working on a project where we have a large data warehouse which

Question

0

Asked: May 15, 20262026-05-15T13:41:10+00:00 2026-05-15T13:41:10+00:00

I’m currently working on a project where we have a large data warehouse which

0

I’m currently working on a project where we have a large data warehouse which imports several GB of data on a daily basis from a number of different sources. We have a lot of files with different formats and structures all being imported into a couple of base tables which we then transpose/pivot through stored procs. This part works fine. The initial import however, is awfully slow.

We can’t use SSIS File Connection Managers as the columns can be totally different from file to file so we have a custom object model in C# which transposes rows and columns of data into two base tables; one for column names, and another for the actual data in each cell, which is related to a record in the attribute table.

Example – Data Files:

Example – DB tables:

The SQL insert is performed currently by looping through all the data rows and appending the values to a SQL string. This constructs a large dynamic string which is then executed at the end via SqlCommand.

The problem is, even running in a 1MB file takes about a minute, so when it comes to large files (200MB etc) it takes hours to process a single file. I’m looking for suggestions as to other ways to approach the insert that will improve performance and speed up the process.

There are a few things I can do with the structure of the loop to cut down on the string size and number of SQL commands present in the string but ideally I’m looking for a cleaner, more robust approach. Apologies if I haven’t explained myself well, I’ll try and provide more detail if required.

Any ideas on how to speed up this process?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T13:41:11+00:00

The dynamic string is going to be SLOW. Each SQLCommand is a separate call to the database. You are much better off streaming the output as a bulk insertion operation.

I understand that all your files are different formats, so you are having to parse and unpivot in code to get it into your EAV database form.

However, because the output is in a consistent schema you would be better off either using separate connection managers and the built-in unpivot operator, or in a script task adding multiple rows to the data flow in the common output (just like you are currently doing in building your SQL INSERT…INSERT…INSERT for each input row) and then letting it all stream into a destination.

i.e. Read your data and in the script source, assign the FileID, RowId, AttributeName and Value to multiple rows (so this is doing the unpivot in code, but instead of generating a varying number of inserts, you are just inserting a varying number of rows into the dataflow based on the input row).

Then pass that through a lookup to get from AttributeName to AttributeID (erroring the rows with invalid attributes).

Stream straight into an OLEDB destination, and it should be a lot quicker.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently working on a project where we have a large data warehouse which

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply