I have a batch process that imports large amounts of data. I am doing this by reading in a large text file and, parsing, then executing inserts, updates and deletes as dictated by the data. These are simple statements being executed as Stored Procedures. The batch of stored procedure calls are wrapped inside a transaction to be sure the file is completely processed before moving on. The batch import is done once a week. If a client gets behind, there could be several large transactions in a row. When this happens, I occasionally get command timeouts in transactions after the first one. I have increased the command timeout to 120 seconds. Works fine so far, but, on a slow computer, it might not. I have noticed that the timeout often occurs on an:
Update <table> set <columns> where <pk = some value>
I thought that perhaps SQL is still updating indexes in the background. Any ideas on what is going on?
I realize that I could use something like SqlBulkCopy, but, that is not an option right now.
Thanks,
Scott
Does your sproc go row-by-row or does it do set-based INSERTS, UPDATES and so forth? The answer to this question will become important a few paragraphs down.
But first, the problem with expanding the timeout is the n+1 problem, no matter how long you make it there will be that case that still goes over. So expanding the timeout is not a permanent sleep-at-night-like-a-baby solution. Much better is to eliminate the need for the timeout by breaking up the job.
First thing you do is eliminate that wrapping transaction. The resources required to maintain locks explode as the row count in the operation goes up, so that batch operations like this often go faster if you break them up into smaller steps that require smaller transactions.
The next step, since you no longer have that wrapping transaction, is to make sure that each individual step can be safely re-run in the event of a failed job, no matter where the process was when it failed. This is called “idempotent” if you want to be fancy, or “re-runnable” if you want to use plain English.
Now we return to the question, is your sproc going row-by-row or is it executing INSERTS that affect many rows, then UPDATES, and so on.
CASE ROW-BY-ROW: Easiest, though probably the slowest. Gobble the text file into an “INBOX” table and add a column “Processed” which is Y/N. As you go row by row, you do your INSERT, UPDATE or DELETE, then update the row in the inbox table as Processed=Y. If you pull the plug at any stage, the sproc simply resumes looking at unprocessed rows until there are none left. This gives you the same affect as the great big wrapping transaction without the overhead. You can run dozens of files in a row and the server will never time out.
CASE SET-BASED: If you are doing set-based DML, then you modify the INSERT so it pulls from the INBOX table and INSERTS into the target table for all rows not already there. This makes it re-runnable. DELETE statements don’t need this check, if you re-run a set-based DELETE and it has already run, it simply finds nothing to delete. UPDATE is basically the same as DELETE in.
This is general advice based on what you stated about your issue. To get more specific I would need to know more about the process.