I have the following scenario:
- ETL loads data into the DW.
- Reports run on demand, maybe at the same time with the ETL, consuming data from the DW.
And this my problem: I need to make sure the reports do not contain partial data:
- If reports are running when ETL is ready to load data, ETL must wait for reports to complete.
- If ETL is loading and a report is requested, report must wait for ETL to finish.
- If ETL is waiting to load and a report is requested, report must wait for ETL to run and finish – ETL always has priority over reports.
What is the best mechanism to get this? Database locks do not seem to be intelligent enough to manage the priorities I need.
Should I implement my own locking mechanism? If yes, is there a well-known design for it? Many things must be taken into account: keep track of currently running reports (lock-for-reads), implement lock expiration for cases when the ETL fails notifying it finished, etc.
If you are using Cognos, then I think you’re basically out of luck from a “prevent report from running if ETL is running” kind of set up, unless you want to muck about in badly-documented APIs.
Your best bet is to probably identify the specific reports, usually ones that are run against aggregates, and make sure that you set up your ETL process to update the facts and aggregates last, and as one big “update” transaction. If you use a dbms that gives you read-consistency, you should be able to do this without a report showing up with only half the data loaded to it.
Reports that access multiple facts / multiple aggs will be more troublesome. It may even be that you may have to set up some kind of “table swap” where you build what you need then
alter table renameto swap out the tables.