I’m working with large databases and need advice on how to optimize my selects/updates. Here’s an ex:
create table Book (
BookID int,
Description nvarchar(max)
)
-- 8 million rows
create table #BookUpdates (
BookID int,
Description nvarchar(max)
)
-- 2 million rows
Let’s assume that there’s 8 million Books and I have to update the genre for 2 million of them.
Problem: the time to run these updates is very long. It will occasionally cause blocking for the users who are also trying to run statements off the database. I’ve come up with a solution but want to know if there’s a better one out there. I have to prepare one-off random updates like this alot (for whatever reason)
-- normal update
update b set b.Description = bu.Description
from Book b
join #BookUpdates bu
on bu.BookID = b.BookID
-- batch update
while (@BookID < @MaxBookID)
begin
update b set b.Description = bu.Description
from Book b
join #BookUpdates bu
on bu.BookID = b.BookID
where bu.BookID >= @BookID
and bu.BookID < @BookID + 5000
set @BookID = @BookID + 5000
end
The second update works a lot faster. I like this solution because I can print status updates to myself on how long it has left and it doesn’t cause performance issues on our customers.
Question: am I missing something important here? Indexes on the temp tables?
I updated the EXAMPLE tables so I don’t get more normalization comments. Only 1 description per book 🙂
You can prevent blocking on the query side by using
NOLOCKorREADUNCOMITTEDhints on the SQL queries.The real issue with performance is probably the accumulation of changes in the log. Your method of batching the changes in groups of 5,000 is quite reasonable. Because you are setting up the updates in a batch table, you might as well calculate the batch number in the table and then do the looping based on that.