I am tuning a query on SQL Server 2005. Please note the real question

Question

0

Asked: June 10, 20262026-06-10T06:33:52+00:00 2026-06-10T06:33:52+00:00

I am tuning a query on SQL Server 2005. Please note the real question

0

I am tuning a query on SQL Server 2005.
Please note the real question is at the end.
I have following query, both pto and ph has about 30million rows. The query initially run very slow (3 mins). So I added two index on pto, ph respectively.

        SELECT 
            MAX(ph.txn_date_time)
        FROM 
            pto AS pto WITH (NOLOCK) 
            INNER JOIN ph AS ph WITH (NOLOCK) ON ph.receipt_id = pto.receipt_id
        WHERE 
                pto.subtype = 'ff'
            AND pto.Units_No > 0
            AND ph.branch_id = 5



CREATE NONCLUSTERED INDEX [IX_pto_subTypeUnitReceipt] ON [dbo].[pto] 
(
    [SUBTYPE] ASC,
    [Units_No] ASC,
    [RECEIPT_ID] ASC

)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]


CREATE NONCLUSTERED INDEX [IX_ph_branchReceiptTxn] ON [dbo].[ph] 
(
    [BRANCH_ID] ASC,
    [RECEIPT_ID] ASC,
    [TXN_DATE_TIME] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]

Now the query runs in 350ms. Great. The execution plan is also very simple, it uses the created index from the two tables and did a Hash join on the receipt_id column then a Stream Aggregate to do the MAX(ph.txn_date_time). So every column in the query is covered by the two added index.

The question is why it used a Hash join on the receipt_id column? I mean since RECEIPT_ID in both indexes are sorted the optimizer should have used a merge join. To figure out why I changed the first index to below (put RECEIPT_ID before Units_No).

CREATE NONCLUSTERED INDEX [IX_pto_subTypeUnitReceipt] ON [dbo].[pto] 
(
[SUBTYPE] ASC,
[RECEIPT_ID] ASC,
[Units_No] ASC


)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]

And now I see the Merge join on the RECEIPT_ID column. The query also runs in 170ms. Now obviously the optimizer think the RECEIPT_ID in both indexes are sorted so a merge join is used. But I don’t understand why in the first case it doesn’t think so?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T06:33:54+00:00

The reason is that RECEIPT_ID isn’t the first sorted item in the indexes you had. You had units_no in the way.

Imagine you had a row of books ordered by publisher, then by author, then by colour. If you wanted to find all the books of a specific colour, you would need to visit each publisher section, then each author section and then find the books of the right colour. So that ‘index’ wouldn’t be very appropriate for scanning by colour, even though you could, at a stretch, say the books were sorted by colour.

When you add the last index, RECEIPT_ID is available sorted, because you are limiting the query by SUBTYPE. Therefore all of the RECEIPT_ID values from both sides are simply available, cost is low and a merge join is picked.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am tuning a query on SQL Server 2005. Please note the real question

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply