I had to write a query to update all records in a table based on records that exist in two other different tables. I wrote the following three iterations of the query, I think the third one is the most efficient and the first one the worst. I just wanted a second opinion, and find out if i can do better than the third version below:
P.S : The first one is not really a valid SQL query, but a pseudocode of how i planned to query the database.
SELECT AccountID,Label FROM QueueTable
For each record in query above
SELECT FeedbackID FROM FeedbackIndexed WHERE FeedbackIndexed.Label = QueueTable.Label
AND FeedbackIndexed.AccountID = QueueTable.AccountID
UPDATE FeedbackTable SET Flag = 1 WHERE FeedbackID=@FeedbackID
next
---------------------------------------------------------------------------------------------------------------------
UPDATE FeedbackTable
SET Flag = 1
WHERE FeedbackID IN(SELECT DISTINCT FeedbackID
FROM FeedbackIndexed,
QueueTable
WHERE FeedbackIndexed.Label = QueueTable.Label
AND FeedbackIndexed.AccountID = QueueTable.AccountID)
----------------------------------------------------------------------------------------------------------------------
UPDATE FeedbackTable
SET FeedbackTable.Flag = 1
FROM FeedbackTable
INNER JOIN FeedbackIndexed
ON FeedbackIndexed.FeedbackID = FeedbackTable.FeedbackID
INNER JOIN QueueTable WITH (TABLOCK)
ON FeedbackIndexed.Label = QueueTable.Label
AND FeedbackIndexed.AccountID = QueueTable.AccountID
(I used table lock on QueueTable because at the end of this query i want to drop all records from the que and don't want other parts of the app adding more records to this table while the query above runs, is that right way to do this?)
Both your second and third examples are valid. Here are a few points:
DISTINCTthat will simply add overhead. When you perform anINoperation, SQL will typically not perform the complete join operation and exit early as soon as a match is found. It also doesn’t return all the rows, just true/false whether there is a match for a given value.INin your second example may yield a more optimal join operator (semi-join vs join) because you’re explicitly stating that you are not interested in the output from the subquery, just whether or not there are records returned.EXISTSclause. Although it’s a common misconception thatINless efficient thanEXISTS(they actually implement queries the same in most cases)INcan give unexpected results when dealing with nulls.The
EXISTSversion would look something like this:The underlying query plan will likely be exactly the same as your
INexample (after you remove the redundantDISTINCT) and it may yield the same query plan as the 3rd example but it’s always good to know different approaches to solving a problem.A few more points.
TABLOCKwill be released when the query completes unless you wrap the query and the query to drop the processed records in an explicit transaction. I’m pretty sure you’ll want to addHOLDLOCKhere too.HOLDLOCKwill hold the lock for the duration of the transaction.TABLOCKwill implement a shared lock which may cause a race condition if your consumer proc is running multiple instances simultaneously. Consider usingTABLOCKXif that will be a problem.I hope this helps.