I’m working on a project where I need to analyze Apache logs using SSAS. I’ve already loaded data into temporary table. I created dimension tables (primary key and attibute_name), empty fact table (foreign keys for each dimension table and fact_attribute) and created relations between them. Then I split data from that table into dimension tables using
INSERT INTO DimIP (IP) SELECT DISTINCT RemoteHostName FROM tmp
…and so on.
Now I need to populate Fact table with foreign keys, but I don’t have any idea how to do this with single query. I tried something like this:
INSERT INTO Facts (DimDateID, DimIPID, DimRefererID, DimRequestID, DimStatusCodeID, DimUserAgentID)
SELECT DimDate.ID WHERE (DimDate.Data = tmp.DateTime)
SELECT DimIP.ID WHERE (DimIP.IP = tmp.RemoteHostName)
SELECT DimReferer.ID WHERE (DimReferer.Referer = tmp.Referer)
SELECT DimRequest.ID WHERE (DimRequest.Request = tmp.Request)
SELECT DimStatusCode.ID WHERE (DimStatusCode.StatusCode = tmp.StatusCode)
SELECT DimUserAgent.ID WHERE (DimUserAgent.UserAgent = tmp.UserAgent)
But it doesn’t work (it says insert list contains fewer items than select list), probably I can’t use such syntax.
I tried doing it one by one, like this:
INSERT INTO Facts (DimDateID)
SELECT DimDate.ID WHERE (DimDate.Data = tmp.DateTime)
But sometimes it says that other column can’t be NULL (ex. DimUserAgentID), so query fails, sometimes it executes query, says “806000 rows affected” but nothing is inserted.
I will appreciate your help, cause I already ripped half of my hair from my head and don’t know how the way to populate fact table with foreign keys from dimension tables.
I believe what you need to do is reference those other tables in your query. Below I use the
tmpas the main driver of the query and then attempted to look up the resulting ID based on the logic you provided. Those lookups are viaLEFT OUTER JOINs which implies the relationship may not be there in which case NULL will go into your fact table. If you’d rather have the row filtered out of hitting the fact table, substitute anINNER JOINfor all of the occurrences. I also assumed your tables were all in dbo schema.Finally, it seems you’re missing something measurable, unless you’re just counting rows in the
Factstable.