I executed an update statement along the following lines yesterday:
UPDATE MainTable
Set SomeField = SubsetTable.SomeField
where MainTable.MainTableKey = SubsetTable.MainTableKey
where SubsetTable is a subset of MainTable and has the same Primary Key field. MainTable has roughly 200m records, SubsetTable has 5m records. MainTableKey is a GUID.
Both of these table have a clustered index on MainTableKey.
When I executed this query the first time it took a whopping 14 hours.
Then I added a non-clustered index to MainTableKey on both tables. Now it takes 30 minutes.
Does anyone have any ideas on why the performance gain would be so dramatic?
I bet if you look at the execution plans:
The first query is a merge join which involved reading both tables completely.
The second query is a nested loop join which reads the whole small table and seeks into the large table for each small table row.
That these rates are approximately equal, supports my theory about which rows are read.
You don’t have to guess: Run the queries with
SET STATISTICS IO ON, and/or look at the execution plans.