I’m implementing a basic star-schema to provide purchase reports for my company. Our fact tables are summarized with 4 dimensions and aggregated with daily, weekly, monthly and yearly totals.
The code currently knows how to process reports for single days, weeks, months and years. The next step is to implement arbitrary date range reporting. Provided a range, the goal is to understand the total number of years, months, weeks and days between the two dates and pull to appropriate records to calculate the total. The problem is we need to determine the count of each full period of granularity between the two dates, not just the amount of time elapsed.
For example, 2 years have elapsed between ‘2009-06-29’ and ‘2011-06-29’, however we need to know that this range consists of one full year (2010), eleven months (Jan-May/10 & Jul-Dec/09) and 58 days (Jun 1-29/09 & Jun 1-29/11).
From this result we can pull the already summarized records from the 70 granular periods, combine and present a total.
I’ve been writing test code to determine the best way to break down a date range into its component parts, however I’m stepping back as I suspect that I’m overthinking this process. The current draft works as:
- Populate a “datesToParse” array with the initial date range.
- Determine if one or more full years exist between the dates.
- For each year between the dates, remove that period from the date range and split the “period before” and “period after” the year into two new date ranges.
- Push the two new date ranges on the “datesToParse” stack.
- Repeat
- When all possible years have been removed from the “datesToParse” array, repeat the process for months, weeks and days.
In theory this should recursively reduce the initial date range down to a collection of full years, months, weeks and days.
Is there a better way to do this? This seems like a problem that has been solved many times before.
I don’t understand why you want to implement such a complex solution, the usual implementation is to have only one fact table with the data at the lowest level of granularity (daily in your case) and simply SUM() up the measures in your queries as required.
That is very simple to implement and maintain and queries are very easy to write (or generate from your reporting tool). Does this not work for you? What volume of data do you have? Have you implemented the date as a dimension (hopefully yes) or as a value in the fact table? Are you using a reporting tool (SSRS, Cognos, Business Objects) or rolling your own queries?
If you are thinking about performance issues, it’s quite common for a DWH to evolve like this:
Your solution sounds somewhat like a home-made OLAP implementation, but it isn’t clear why you need it. If your data volume is small to medium you will probably be able to manage it very well with indexing and partitioning. If it’s large then you’re probably looking at using OLAP and specialized reporting tools anyway, which would be a much broader issue. But you haven’t given much information about your environment or requirements, so I may be off the mark here.