I’m trying to optimize my SQL queries and I always come back to this one issue and I was hoping to get some insight into how I could best optimize this.
For brevity, lets say I have a simple employee table:
tbl_employees
Id HiredDateTime
------------------
1 ...
2 ...
That has versioned information in another another table for each employee:
tbl_emplyees_versioned
Id Version Name HourlyWage
-------------------------------
1 1 Bob 10
1 2 Bob 20
1 3 Bob 30
2 1 Dan 10
2 2 Dan 20
And this is how the latest version records are retrieved in a View:
Select tbl_employees.Id, employees_LatestVersion.Name, employees_LatestVersion.HourlyWage, employees_LatestVersion.Version
From tbl_employees
Inner Join tbl_employees_versioned
ON tbl_employees.Id = tbl_employees_versioned.Id
CROSS APPLY
(SELECT Id, Max(Version) AS Version
FROM tbl_employees_versioned AS employees_LatestVersion
WHERE Id = tbl_employees_versioned.Id
GROUP BY Id) AS employees_LatestVersion
To get a response like this:
Id Version Name HourlyWage
-------------------------------
1 3 Bob 30
2 2 Dan 20
When pulling a query that has over 500 employees records for which each have a couple few versions, this query starts choking up and takes a few seconds to run.
There are a couple strikes right off the bat, but I’m not sure how to overcome them.
-
Obviously the Cross Apply adds some performance loss. Is there a best practice when dealing with versioned information like this? Is there a better way to get just a record with the highest version?
-
The versioned table doesn’t have a clustered index beause neither Id or Version are unique. Concatenated together they would be, but it doesn’t work like that. Instead there is a non-clustered index for Id and another one for Version. Is there a better way to index this table to get any performance gain? Would an indexed view really help here?
I think the best way to structure the data is using start dates and end dates. So, the data structure for your original table would look like:
Then, you can see the current version using a view:
In some cases, where future end dates are allowed, the where clause would be:
Alternatively, in this case, you can default EndDate to some future date far, far away such as ’01-o1-9999′. You would add this as the default in the create table statement, make the column not null, and then you can always use the statement:
As Martin points out in his comment, the coalesce() might impede the use of an index (it does in SQL Server), whereas this does not have that problem.
This is called a slowly changing dimension. Ralph Kimball discusses this concept in some length in his books on data warehousing.