This is SQL Server 2008. I have these two tables and a join:
DECLARE @EmployeeCrossDay TABLE
(
EmployeeId UNIQUEIDENTIFIER,
WorkDate DATE, OtherStuff...
)
DECLARE @ET TABLE
(
EmployeeId UNIQUEIDENTIFIER,
WorkDate DATE, DifferentOtherStuff...
)
SELECT *
FROM @EmployeeCrossDay ecd
LEFT JOIN @ET et ON et.EmployeeId = ecd.EmployeeId
AND et.WorkDate = ecd.WorkDate
The first table has 5,680 rows (one for each employee for each date in a range), the second has 397 (one or more for each day that the employee actually worked). (Thus, EmployeeId/WorkDate is a unique combination in the first table, but not in the second.) The results of my query are correct (a list of each employee with one or more rows for days he worked and a row for each day he didn’t work), but it takes about 3 seconds and my profile shows a Cartesian product along the way (2,254,960 rows). Is there a way to restructure this query to prevent the complete cross join?
* EDITED *
After adding the primary keys, as suggested, Set Showplan_Text On gives me this:
|--Compute Scalar(DEFINE:([Expr1007]=isnull(@ET.[StartTime] as [et].[StartTime],[Expr1010]), [Expr1008]=isnull(@ET.[EndTime] as [et].[EndTime],[Expr1010])))
|--Nested Loops(Left Outer Join, OUTER REFERENCES:([et].[ServiceCallId]))
|--Compute Scalar(DEFINE:([Expr1006]=isnull(@ET.[TypeId] as [et].[TypeId],(8)), [Expr1009]=isnull(@ET.[Interrupt] as [et].[Interrupt],($0.0000))))
| |--Nested Loops(Left Outer Join, WHERE:(@ET.[EmployeeId] as [et].[EmployeeId]=@EmployeeCrossDay.[EmployeeId] as [ecd].[EmployeeId] AND @ET.[WorkDate] as [et].[WorkDate]=@EmployeeCrossDay.[WorkDate] as [ecd].[WorkDate]))
| |--Compute Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(datetime,@EmployeeCrossDay.[WorkDate] as [ecd].[WorkDate],0)))
| | |--Sort(ORDER BY:([ecd].[Number] ASC, [ecd].[WorkDate] ASC))
| | |--Clustered Index Scan(OBJECT:(@EmployeeCrossDay AS [ecd]))
| |--Clustered Index Scan(OBJECT:(@ET AS [et]))
|--Clustered Index Seek(OBJECT:([Snapper].[dbo].[ServiceCalls].[PK_Jobs] AS [sc]), SEEK:([sc].[ServiceCallId]=@ET.[ServiceCallId] as [et].[ServiceCallId]) ORDERED FORWARD)
What I mean by “shows a Cartesian product along the way” comes from setting Statistics Profile on. It shows too much to paste in here, but for the next to last item in the plan (Clustered Index Scan), it shows 2,254,960 (my commas) under Rows and 5680 under Executes. Am I misreading that to say I have a Cartesian product?
I got three good answers, but they all came as comments, so I am posting ‘the’ answer here.
Aaron suggested that I add a primary key to each table var. I added
to @ET and
to @EmployeeCrossDay because, in both cases, EmployeeId was not unique.
Even though EmpmloyeeTimeId was not involved in the join, this alone reduce my query from over 3 seconds to under 1. However, my Statistics Profile showed that one of the steps in the execution plan was running 5680 times and hitting over 2.2 million rows (5,680 * 397). Even though the response time was acceptable, I was curious about that.
Martin then suggested that my key for @ET needed to have EmployeeId leading. So I replaced the key with
At this point, the execution plan showed the former cross join reduced to hitting only 397 rows (instead of over 2 million), even though it was still doing the process for each @ET row (5680), it was now doing a clustered index seek instead of a full table scan.
Along the way, Gordon suggested adding
I removed all my previously applied indexes and no step in the resulting plan executed more than once.
All three suggestions returned the same data in under a second, so I gave points (and now, my thanks) to each one.