I want to create CLR UDF to scan a SQL table and read each column and sum all data for
each column (I am working with big datasets, like >1000 columns and 20,000,000>rows).
I’d like to compare applying for each column SUM(COLUMN_NAME) SQL function with parallelized for loop
So The SQL would look like
SELECT SUM(COLUMN_1),SUM(COLUMN_2),SUM(COLUMN_3),...,SUM(COLUMN_1000)
How can I do a CLR UDF in C# that would do that?
I am planning to use an array, so each time I read the row I add it like:
array[i]+= sqlValue;
How to do this, so I can execute both of them in a stored proc?
Table
column_1 column_2 column_3 .... column_1000
---------------------------------------------
451 57 253 135
251 77 356 965
481 15 323 655
452 15 135 665
...
...20,000,000 more rows
So what you’re trying to achieve is to do a thousand
SELECT column_x FROM table, one for each column, and do the summing from hand.That means 1000 parallel connections to the database all working within the same table within the same rows, locking each other (until you use
with nolock).The benefit to
SELECT sum(column_1), sum(column_2), .... I can’t see one and I believe what you’re trying will be magnitudes slower than letting SQL Server doing what it does best.EDIT:
As per your request here’s a quick’n’dirty sample, not tested since I currently have no SQL server at hand. I assumed the columns are of type
longand the result is of typedecimal.