My question:
I have a mysql database that consists of something like a fact table (although not every field is a lookup) and a variety of other tables. When I want to display data from that “fact” table, is it necessary to run a query to each individual lookup or is there a way to make a temporary table that has already done the “looking up”?
Example:
Table structure –
- unique_id(auto increment int),
- model(int, lookup to table #2),
- type(int, lookup from table #2 to table #3)
- employee(int, lookup to table #4)
- notes(text)
- cost(float)
- hours(float)
–
So for instance when I want to make a php page to enter this data it seems like a lot more “work” than it needs to be:
-
unique_id (not shown as a data entry field, increments automatically
on submit) -
model (drop down box. population requires query to table #2 where status = X)
-
type (read-only text box shows type of model. Requires query to table #3 based on column from table #2)
-
employee (drop down box. population requires query to table #4 where employee_status = “Active”)
-
notes (text box, user inputs related notes to submission)
-
cost (texts box, user enters costs related to submission)
-
hours (text box, user enters hours related to submission)
Just to get a simple form populated with valid data requires what seems to me like A LOT of queries/lookups.
Is this the best way? Is there a better way?
Aside: I have control over the data structure, so if the problem is the database design, then those suggestions would be helpful as well.
Dimension tables typically don’t change very often, at least relative to the number of inserts to the fact table. Dimension tables are also individually much smaller than the fact table. This makes dimension tables good candidates for caching.
What some people do to good effect is to render the partial HTML output for the form, with all the data populated as dropdowns, radiobuttons, etc. Then store that partial HTML under a memcached key so you don’t have to do any of the database queries or the HTML render for most PHP requests — you just fetch the pre-populated HTML fragment out of memcached and echo it verbatim. I think of this like the “Ikea” of database-driven output.
Of course if you ever do change data in a dimension table, you’d want to invalidate the cached HTML, or even better re-generate it and store a new version of the HTML in memcached.
Regarding doing all the lookups, I’ll point out that there’s no requirement to use pseudokeys in a fact table. You can use the natural values, and make them reference the primary key of the dimension table, which also can be a natural key instead of a pseudokey. It might take a bit more space in some cases, but it eliminates the lookups. Of course it may make sense to continue using pseudokeys for dimensions that are long varchars.