We need to perform the following operation in our database :
There is a table A which has column B_ID that is a foreign key to the table B. There are many rows in the table A that have the same value of B_ID and we want to fix this by cloning the corresponding rows in B and redirecting the rows from A to them.
All this is relatively simple and we have already created a script that solves this by iterating over a cursor and calling a stored procedure for cloning the row in table B. Now the problem is that both A and B tables are huge and there is also a huge number of the groups within table A pointing to the same row in B.
What we end up with is (after a couple of minutes of execution) is filling up the transaction log and crashing. We have even tried to divide the work into batches of reasonable size and run them one by one, but this also eventually fills up the log.
Apart from somehow cleaning up the log, is there some way to handle bulk inserts / updates of data in SQL Server that would be faster and not blow up the log at all ?
Here’s another way to do this in a batch (no cursors). @KM’s looks like it should work but it looks a little slow/scary to me with lots of locking and scans involved; if you restrict the working set to only the new rows then it should be pretty fast.
Here’s the setup script for the test data:
So we have a 1:Many and we want to make this a 1:1. To do this, first queue up a list of updates (we’ll index this over some other set of unique columns to speed up merging later):
The result will have one row for every marker that needs to get a new colour. Then insert the new colours and capture the full output:
And finally merge it (here’s where that extra index on the temp table comes in handy):
This should be very efficient because it only ever has to query the production tables once. Everything else will be operating on the relatively small data in the temp tables.
Test the results:
Here’s our output:
This should be what you want, right? No cursors, no serious ugliness. If it chews up too much memory or tempdb space then you can replace the temp table / table variable with an indexed physical staging table. Even with several million rows, there’s no way this should fill up the transaction log and crash.