I’m working rearchitecting a reporting/data warehouse type database. We currently have a table that has data at the hotel grain (i.e. HotelID plus lots of measures, including measures like Last7DaysGross, Last28DaysXXX, etc).
I’m thinking that it would be best to move to a fact table that is at the Hotel/StayDate grain. However, grouping on the HotelID and including date related measures such as Last7DaysGross need to perform very well.
What kind of structures would work here? I don’t think I’d be able to use indexed views the way that I had hoped, because of the multiple restrictions on them (no subqueries, etc.) To have reasonable performance, will I need to create a new table at the Hotel level (aggregated from the HotelStayDate level?) That’s the level at which people will most often be querying. Do I need to actually create fields such as Last7DaysGross? That doesn’t seem like a good design, but I’m having a hard time coming up with another one.
Sorry this question is a little vague. Is there something else I’m missing here? I know most often these kind date related measures would be done at the front-end level (i.e. in a tool such as Business Objects). However, for this project, we’ll need to have it in the database.
thanks,
Sylvia
EDIT:
Thanks for all the thoughtful comments! I accepted David Marwick answer because of his idea of having an expanded date dimension. That thought hadn’t even crossed my mind, and it sounds well worth trying.
Expanding a little on David Marwick’s thoughts, I came up with this idea. I might try and see how it actually works:
DateDimension
DateKey
DateKeyBeginLast28Days
DateKeyEndLast28Days
Fact
DateKey
GrossTransactions
Then when querying:
Select
DateKey
,SumLast28Day = sum(GrossTransaction)
from Fact
join DateDimension
on Fact.DateKey >= DateDimension.DateKeyBeginLast28Days
and Fact.DateKey <= DateDimension.DateKeyEndLast28Days
group by DateKey
I think your design of having one table at the [Hotel, Date] grain then rolling up into Hotel sounds fine. As Damir points out it keeps your read queries simple and makes it easy to add/remove aggregate measures going forward (keeping in mind that it’s generally a bad idea to design around requirements that you may have in the future).
Pondlife makes good points as well. Your qualitative requirements might dictate how feasible it is to maintain an aggregate table, for example how often the system needs to update (daily, hourly, 15 mins, realtime?), how accurate the measures need to be (maybe the users just need a rough sense of how well each hotel is doing), how costly it is to read the source transaction data, how available the source transaction data is in the long-term (does it get archived), etc.
If you choose to add a [Hotel, StayDate] grain fact table and not maintain an aggregate then perhaps you can explore some tricks in your dimensions to save time. Maybe something like a 7-day date dimension containing [date, date_in_last_7_days] (so 7 records for each date) in case a straight join vs range querying the past 7 days saves you any time. That might be a stupid example but something along those lines. Date dimensions are small.
Lastly consider hardware optimizations like moving tables into memory (especially dimensions or non-gigantic fact tables) if you need to improve performance.