I have a Client Dimension and a Fact table which tracks Sessions with Clients, these have the following columns:
Code:
[DimClient]
----------
PK_ClientKey
ClientNumber
EmailAddress
Postcode
PostcodeLongitude
PostcodeLatitude
DateOfBirth
Gender *
Sexuality *
CulturalIdentity *
LanguageSpokenAtHome *
CountryOfBirth
UsualAccommodation *
LivingWith *
OccupationStatus *
HighestLevelOfSchooling *
RegistrationDate
LastLoginDate
Status
[FactSession]
-------------
PK_SessionKey
FK_ClientKey
...
My first requirement was to start grouping the age of the Clients at a specific Session (FactSession), the best way to approach this was to create a Age Group dimension and create a foreign key (FK_AgeGroupKey) in the FactSession to the DimAgeGroup dimension.
Now I’m thinking it would be good to track all the columns with an * (above). These could (not yet proven) have a high correlation against Sessions. Reading through the DWH Toolkit it seems a Mini Dimension to accomodate all the * columns along with the Age Group would suit best, so I put together the following structure:
Code:
[DimClient]
----------
PK_ClientKey
ClientNumber
...
Status
[DimDemographic]
-----------------
PK_DemographicKey
AgeGroup
Gender
Sexuality
...
HighestLevelOfSchooling
[FactSession]
-------------
PK_SessionKey
FK_ClientKey
FK_DemographicKey
The DimDemographic table would need to utilize a SCD Type 2 to be able to track the changes over time. Would this be the best approach to my requirements?
Additionally, I have RegistrationDate and LastLoginDate columns on my Client Dimension, in the case where a Client registers but never logs in what would be the best value to put in the LastLoginDate field? Something like ‘1900-01-01’ or NULL?
Sorry for the long post but hopefully I have given enough information Thanks in advance!
Yes, the above solution should work fine. It supports your need to track changes over time, otherwise you can have included the DimDemographic linkage directly in DimClient.
Regarding the date question, I believe you should use NULL, it means that there is no value because there was no login. Also, identifying non-logged-in would be:
For me this reads much better than a query that uses an artificial date.