I have this existing schema where a “schedule” table looks like this (very simplified).
CREATE TABLE schedule (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(45),
start_date date,
availability int(3),
PRIMARY KEY (id)
);
For each person it specifies a start date and percentage of work time available to spent on this project. That availability percentage implicitly continues until a newer value is specified.
For example take a project that lasts from 2012-02-27 to 2012-03-02:
id | name | start_date | availability
-------------------------------------
1 | Tom | 2012-02-27 | 100
2 | Tom | 2012-02-29 | 50
3 | Ben | 2012-03-01 | 80
So Tom starts on Feb., 27nd, full time, until Feb, 29th, from which on he’ll be available only with 50% of his work time.
Ben only starts on March, 1st and only with 80% of his time.
Now the goal is to “normalize” this sparse data, so that there is a result row for each person for each day with the availability coming from the last specified day:
name | start_date | availability
--------------------------------
Tom | 2012-02-27 | 100
Tom | 2012-02-28 | 100
Tom | 2012-02-29 | 50
Tom | 2012-03-01 | 50
Tom | 2012-03-02 | 50
Ben | 2012-02-27 | 0
Ben | 2012-02-28 | 0
Ben | 2012-02-29 | 0
Ben | 2012-03-01 | 80
Ben | 2012-03-02 | 80
Think a chart showing the availability of each person over time, or calculating the “resource” values in a burndown diagram.
I can easily do this with procedural code in the app layer, but would prefer a nicer, faster solution.
To make this remotely effective, I recommend creating a
calendartable. One that contains each and every date of interest. You then use that as a template on which to join your data.Equally, things improve further if you have
persontable to act as the template for the name dimension of your results.You can then use a correlated sub-query in your join, to pick which record in
Schedulematches thecalendar,persontemplate you have created.Often, however, it is more efficient to deal with it in one of two other ways…
Don’t allow gaps in the data in the first place. Have a nightly batch process, or some other business logic that ensures all relevant dat apoints are populated.
Or deal with it in your client. Return each dimension in you report (data, and name) as seperate data sets to act as your templates, and then return the data as your final data set. Your client can itterate over the data and fill in the blanks as appropriate. It’s more code, but can actually use less resource overall than trying to fill-the-gaps with SQL.
(If your client side code does this slowly, post another question examining that code. Provided that the data is sorted, this is acutally quite quick to do in most languages.)