Which would be a good solution to paginate commits when I have a query like
BEGIN TRANSACTION
INSERT INTO
table1
FROM
table2
INSERT INTO
table3
FROM
table4
COMMIT
I am dealing with large amounts of data and I am having some problems to commit the whole thing at once, so I would like to commit something like 5000 rows each commit.
Thought about something like
- maxNumber = get the max number of rows among number of rows from table2 and 3
- maxNumber/5000 = numberOfCommits
- create a loop from 1 to numberOfCommits and process data at row number (using ROW_NUMBER()) (n-1)*5000 to n*5000
Would be great to learn how to do it in a better way!
Thanks in advance!
Processing an entire table with batches based on ROW_NUMBER() is actually a potentially bad idea. In order to return ROW_NUMBER 5001 the engine has to count 5000 rows first. To read row 10001 it has to count again the first 5000, then the next 5000. And so on and so forth, the pattern is very read intensive. If the tables are small, it matter not, but if they’re not…
If your table(s) have at least one unique index (preferably the clustered one) then you can use a combination of
TOP 5000andWHERE uniquecolumn > @lastbatchmaxvalue. If you don’t have such an unique index then you can only do this via a cursor.But maybe the best solution is to get out of the T-SQL constraints. SSIS is ideally suited to do exacty this kind of job, it supports batches and works with efficient bulk insert interface when possible.