Using SQL Server 2008.
I have multiple Locations which each contain multiple Departments which each contain multiple Items which can have zero to many Scans. Each Scan relates to a specific Operation which may or may not have a cutoff time. Each Item also belongs to a specific Package which belongs to a specific Job. Each job contains one or more Packages which contains one or more items.
+=============+ +=============+
| Locations | | Jobs |
+=============+ +=============+
^ ^
| |
+=============+ +=============+ +=============+
| Departments | <-- | Items | --> | Packages |
+=============+ +=============+ +=============+
^
|
+=============+ +=============+
| Scans | --> | Operations |
+=============+ +=============+
What I am attempting to do is get a count of Scans for a Job grouped by Location and Scan date. The tricky part is that I only want to count the first Scan by date/time per Item where the cutoff time for the Operation is not null. (NOTE: the scans definitely will NOT be in date/time order in the table.)
The query I have is getting me the correct results but it is painfully slow when the number of Items for a Job reaches 75,000 or so. I am pushing for a new server — I know our hardware is lacking — but I am wondering if there is something I am doing in the query that is bogging it down as well.
From what little I can glean from the execution plan, most of the cost of the query seems to be in the sub-query to find the first Scan for each Item. It does an index scan (0%) on an Operations table index (ID, Cutoff) and then a lazy spool (19%). It does an index seek (61%) on a Scans table index (ItemID, DateTime, OperationID, ID). The subsequent nested loops (inner join) is only 2% and the Top operator is 0%. (Not that I really understand much of what I just typed but I am trying to provide as much info as possible…)
Here is the query:
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ID =
(
SELECT TOP 1
Scans.ID
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
)
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
Which will return a sampling of results like so:
8 2012-06-08 00:00:00.000 11842
21 2012-06-07 00:00:00.000 502
11 2012-06-12 00:00:00.000 1841
15 2012-06-11 00:00:00.000 4314
16 2012-06-09 00:00:00.000 278
23 2012-06-12 00:00:00.000 1345
6 2012-06-06 00:00:00.000 2005
20 2012-06-08 00:00:00.000 352
14 2012-06-07 00:00:00.000 2408
8 2012-06-11 00:00:00.000 290
19 2012-06-10 00:00:00.000 85
20 2012-06-11 00:00:00.000 5484
7 2012-06-10 00:00:00.000 2389
16 2012-06-06 00:00:00.000 6762
18 2012-06-09 00:00:00.000 4473
14 2012-06-10 00:00:00.000 2364
1 2012-06-11 00:00:00.000 1531
22 2012-06-08 00:00:00.000 14534
5 2012-06-10 00:00:00.000 11908
9 2012-06-12 00:00:00.000 47
19 2012-06-07 00:00:00.000 559
7 2012-06-07 00:00:00.000 2576
Here’s the execution plan (not sure what I changed since the original post but the cost % are slightly different. The bottleneck still seems to be in the same area though):
I am a little leery about marking this as the answer as I am sure we can still squeeze a little juice out of the query. But this did knock my test run from 22 seconds down to 6 seconds (with an added index on Scans: OperationID including DateTime and ItemID):