I was hoping someone could point me to some best practices regarding when to calculate a computed value which is stored in a data warehouse.
Consider the following example,
CREATE TABLE
(
MyFactID INT NOT NULL IDENTITY(1, 1),
OrderDimID INT NOT NULL, -- FK To OrderDimension
StartDate DATETIME NOT NULL,
CompletedDate DATETIME NULL,
ElapsedCalendarTimeInMinutes INT NULL,
ElapsedBusinessTimeInMinutes INT NULL
)
In this example, Elapsed calendar time in minutes would be the time (in minutes) from Start to End date. Then, our business time reflects the working time that was available during those calendar days.
Currently, we are calculating this during ETL and inserting both dates. I’m wondering if this is the correct place to perform this operation.
Some other thoughts were to:
A) Use indexed views by only storing the start and end date in the fact table, then creating a view which calculates the elapsed time in minutes and has a computed column which uses a function to figure out the business days.
B) Use an After trigger to update the elapsed calendar time and business time after an insert occurs which inserts or updates the completed date to a non-null value.
I feel that this should be done in the DB so that if any changes are made to the end date or calculation of the business time, it would reflect. Doing it during ETL seems like it would be prone to problems.
Any thoughts on this are appreciated!
Update: There are at least 6 columns determined in this way. We have business minutes, hours, and days (days are 12 hours for our business); then we have client minutes, hours, and days (determined via lookup table for the client’s working hours); then we have simply calendar minutes, hours, and days (though these are not being stored; only minutes). Since this is a DW, I would have expected all the data to be present and not require calculation. To me, it seems like more work to ensure the ETL is correct and applied everywhere than to create a view overtop of the base data to get the computed information.
The simplest way should be the best solution:
In your ETL proces (Let us supose that is SSIS but you can extrapolate to other technologies):
Merge sample:
This avoid triggers and indexed views.