I’m developing a data warehouse and have come up against a problem I’m not sure how to fix. The current schema is defined below:
DimInstructor <- Dimension table for instructors
DimStudent <- Dimension table for students
I want to implement a scenario whereby if details of an instructor change in my OLTP database, I want to add a new record in the DimInstructor table for historical reporting reasons.
Now, I’m wanting to create a lesson dimension table called DimLesson. In DimLesson I want to create a reference to the instructor.
The DimInstructor table contains:
InstructorDWID <- Identity field when entered into DW
InstructorID <- The instructor ID that has come from the OLTP database
Now, I can’t make InstructorID a primary key because it isn’t guaranteed to be unique (if the instructor changes their name, there will be 2 records in the DW with the same InstructorID value).
So my question is, how do I reference the instructor from DimLesson? Do I use the InstructorDWID? If so, should I have 2 entries for an instructor in DimInstructor, it would make queries more complicated when I’m wanting to look at all lessons by a specific instructor.
Any help would be appreciated!
Paul,
There are multiple ways you can handle this. You can use an effective date/inactive date, sequence number or a version number to differentiate the records with the same InstructorID.
The DIM that captures all relevant details would be like..
instr_guid is directly generated from a sequence and is independent of the OLTP system.
This would let you capture all the details for a given instructor.
You can use just the instr_guid as the foreign key to the fact table, but including both of them (instr_guid,instr_guid) would increase the ease of querying .. which is one of the goals of Datawarehousing.
Useful Links:
http://en.wikipedia.org/wiki/Surrogate_key
http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2