Overall challenge:
We are adding items to a table several times a day for a number of “markets”.
So
- at 12:00 we add 2000 items for market “x”
- at 12:30 we add 3000 items for market “y”
- at 14:00 we add 2500 items for market “x” again
This is done several times each day.
At any given time we need to extract the latest items for each market for each day
The desired result for the above insertions is
2500 items for market “x”
3000 items for market “y”
Each addition of a batch of data has an ExecutionTime timestamp that defines the batch uniquely. So the 2000 items for market “x” at 12:00 will have the same ExecutionTime value and the 2500 items for market “x” at 14:00 will have another ExecutionTime value.
We have created a view doing this for us as
SELECT
*
FROM
dbo.Items AS s
WHERE
(ExecutionTime =
(SELECT MAX(ExecutionTime) AS Expr1
FROM dbo.Items AS s2
WHERE (SiteAlias = s.SiteAlias) AND (Market = s.Market)
AND (LocalTimestamp >=
DATEADD(dd, DATEDIFF(dd, 0, s.LocalTimestamp), 0))
AND
(LocalTimestamp <
DATEADD(dd, DATEDIFF(dd, 0, s.LocalTimestamp), 1))))
We query the view like this:
SELECT *
FROM [ExportedData]
WHERE
SiteAlias = 'MyAlias'
AND LocalTimeStamp between '2012-05-14 00:00' AND '2012-05-18 00:00'
ORDER BY [Timestamp]
We have defined indexes on the table ITems on the fields Execution time and a combined index on sitealias, marked and localtimestamp.
Problem: the performance sucks. It takes several minutes to query about 150000 rows.
Are there any obvious improvements to the view we should do? I am ready to supply queryplans etc – in case there is no simple screwup we did in creating the view.
An interesting thing is that if we query the view with “LIKE” on the SiteAlias instead of “=”, it speeds up the execution with about 90% – which I did not expect.
Thanks,
:o)
/Jesper
Copenhagen
Your T-SQL and table structure look fine to me at first glance – so this is just a wild shot into the dark 🙂
What I would probably try in your position would be to use a CTE (common table expression) and casting
LocalTimestampto datatypeDATEsince you’re on SQL Server 2008.With those in place, you can have your view be something like:
Basically, the CTE “partitions” your data by the date-only part of
LocalTimestampand then assigns sequential numbers to all entries on that day, starting at 1 – so the “newest” or “most recent” entry per day getsRowNum = 1which is what I use in the select from that CTE.This gets around the
SELECT(MAX) ....subquery and seems to be a tad faster in my personal observation – but that’s heavily dependant on your tables and data – so just try that for yourself and see if it helps!