This is in regards to MS SQL Server 2005.
I have an SSIS package that validates data between two different data sources. If it finds differences it builds and executes a SQL update script to fix the problem. The SQL Update script runs at the end of the package after all differences are found.
I’m wondering if it is necessary or a good idea to some how break down the sql update script into multiple transactions and whats the best way to do this.
The update script looks similar to this, but longer (example):
Update MyPartTable SET MyPartGroup = (Select PartGroupID From MyPartGroupTable
Where PartGroup = "Widgets"), PartAttr1 = 'ABC', PartAttr2 = 'DEF', PartAttr3 = '123'
WHERE PartNumber = 'ABC123';
For every error/difference found an additional Update query is added to the Update Script.
I only expect about 300 updates on a daily basis, but sometimes there could be 50,000. Should I break the script down into transactions every say 500 update queries or something?
Would your system handle other processes reading the data that has yet to be updated? If so, you might want to perform multiple transactions.
The benefit of performing multiple transactions is that you will not continually accumulate locks. If you perform all these updates at once, SQL Server will eventually run out of small-grained lock resources (row/key) and upgrade to a table lock. When it does this, nobody else will be able to read from these tables until the transaction completes (unless they use dirty reads or are in snapshot mode).
The side effect is that other processes that read data may get inconsistent results.
So if nodoby else needs to use this data while you are updating, then sure, do all the updates in one transaction. If there are other processes that need to use the table, then yes, do it in chunks.