I have an application that I know would make a great cube and would be useful for more than the standard flat Reporting Services report. We’re about to jump into BI stuff with a consultant, but I’d like to give it a shot before we do, mostly so I know something of what we’re going to do.
The application tracks surveys in nursing homes across the country. They can be annual, complaint, or several other type of survey, they have penalties associated with tags given, and have documentation associated with them.
What I’d like to do is come up with a way that will allow us to leverage the data we have – how many tags in florida for the month of June? How many facilities were on time delivering their documentation? How many annual(surprise) surveys happened in the 1st quarter of this year compared to last year?
I’m including the schemas in hopes that someone will be able to tell me not only what is dim and what is fact, but what data goes where. I figure that’ll be a great start.
Anything would be really helpful. I’m trying to get a small data mart set up while I’m pouring through the Data Warehouse Lifecycle Toolkit by Kimball.
Thanks!
M@
The Entity table – a list of all of our facilities: Primary key is a five letter code denoting the building
CREATE TABLE [dbo].[Entity](
[entID] [varchar](10) NOT NULL,
[entShortName] [varchar](150) NULL,
[entNumericID] [int] NOT NULL,
[orgID] [int] NOT NULL,
[regionID] [int] NOT NULL,
[portID] [int] NOT NULL,
[busTypeID] [int] NOT NULL,
[adpID] [varchar](50) NULL,
[eHealthDataID] [varchar](50) NULL,
[updateDate] [datetime] NULL CONSTRAINT [DF_Entity_updateDate] DEFAULT (getdate()),
[powProID] [int] NULL,
[regionReportingID] [int] NULL,
[regionPresEmail] [varchar](300) NULL,
[regionClinDirEmail] [varchar](300) NULL,
CONSTRAINT [PK_EntityNEW] PRIMARY KEY CLUSTERED
(
[entID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]
Survey Main
CREATE TABLE [dbo].[surveyMain](
[surveyID] [int] IDENTITY(1,1) NOT NULL,
[surveyDateFac] AS (([facility]+'-')+CONVERT([varchar],[surveyDate],(101))),
[surveyDate] [datetime] NOT NULL,
[surveyType] [int] NOT NULL,
[surveyBy] [int] NULL,
[facility] [varchar](10) NOT NULL,
[originalSurvey] [int] NULL,
[exitDate] [datetime] NULL,
[dpnaDate] AS (dateadd(month,(3),[exitDate])),
[clearedTags] [varchar](1) NULL,
[substantiated] [varchar](1) NULL,
[firstRevisit] [int] NULL,
[secondRevisit] [int] NULL,
[thirdRevisit] [int] NULL,
[fourthRevisit] [int] NULL,
[updated] [datetime] NULL CONSTRAINT [DF_surveyMain_updated] DEFAULT (getdate()),
CONSTRAINT [PK_tagSurvey] PRIMARY KEY CLUSTERED
(
[surveyID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
Survey Types:
CREATE TABLE [dbo].[surveyTypes](
[surveyTypeID] [int] IDENTITY(1,1) NOT NULL,
[surveyTypeDesc] [varchar](100) NOT NULL,
CONSTRAINT [PK_surveyTypes] PRIMARY KEY CLUSTERED
(
[surveyTypeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Survey Files
CREATE TABLE [dbo].[surveyFiles](
[surveyFileID] [int] IDENTITY(1,1) NOT NULL,
[surveyID] [int] NOT NULL,
[surveyFilesTypeID] [int] NOT NULL,
[documentDate] [datetime] NOT NULL,
[responseDate] [datetime] NULL,
[receiptDate] [datetime] NULL,
[dateCertain] [datetime] NULL,
[fileName] [varchar](250) NULL,
[fileUpload] [image] NULL,
[fileDesc] [varchar](100) NULL,
[updated] [datetime] NOT NULL CONSTRAINT [DF_surveyFiles_updated] DEFAULT (getdate()),
CONSTRAINT [PK_surveyFiles] PRIMARY KEY CLUSTERED
(
[surveyFileID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Survey Fines
CREATE TABLE [dbo].[surveyFines](
[surveyFinesID] [int] IDENTITY(1,1) NOT NULL,
[surveyID] [int] NULL,
[surveyFinesTypeID] [int] NULL,
[dateRecommended] [datetime] NULL,
[dateImposed] [datetime] NULL,
[totalFineAmt] [varchar](100) NULL,
[wasImposed] [varchar](3) NULL,
[dateCleared] [datetime] NULL,
[comments] [varchar](500) NULL,
[updated] [datetime] NOT NULL CONSTRAINT [DF_surveyFines_updated] DEFAULT (getdate()),
CONSTRAINT [PK_surveyFines] PRIMARY KEY CLUSTERED
(
[surveyFinesID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]
Survey Tags
CREATE TABLE [dbo].[surveyTags](
[seq] [int] IDENTITY(1,1) NOT NULL,
[surveyID] [int] NOT NULL,
[tagDescID] [int] NOT NULL,
[tagStatus] [int] NULL,
[scopesev] [varchar](5) NOT NULL,
[comments] [varchar](1000) NULL,
[clearedDate] [datetime] NULL,
[updated] [datetime] NULL CONSTRAINT [DF_surveyTags_updated] DEFAULT (getdate()),
CONSTRAINT [PK_tagMain] PRIMARY KEY CLUSTERED
(
[seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
This is quite a task for a support forum, so I will focus on just one part of the problem.
Seems that one survey can consists of several visits, so I would suggest factSurveyVisit with a grain of one visit-event. The column SurveyID acts as a degenerate dimension in this model and is common to all visits from the same survey. The SurveyVisitSequenceID is a unique auto-increment (integer) and is used to simplify linking of the two bridge tables for documents and tags to the fact table.
You could also promote a survey into a full dimension dimSurvey to add some notes etc; use SurveyID for link.
I did not tackle fines here, for this I would suggest factFine table which would have its own links to dimDate, dimTime, dimFacility, etc so that reports regarding fines ($$) can be done fast without joining to most of the visit related tables. There should also be a bridge table joining factFine to factSurveyVisit, providing fines are related to each visit and not to a completed survey.
EDIT
Just noticed that your Tag table has date_cleared, so admittedly I do not understand the tagging in this business. In the model, dimTag is just a list of available tags. There may be one more factFacilityStatus table linking dimFacility and dimTag, tracking tag status for each facility.