QUESTION: Is it okay to have “shortcut” identifiers in a table so that I don’t have to do a long string of joins to get the information I need?
To understand what I’m talking about, I’m going to have to lay ouf an example here that looks pretty complicated but I’ve simplified the problem quite a bit here, and it should be easily understood (I hope).
The basic setup: A “company” can be an “affiliate“, a “client” or both. Each “company” can have multiple “contacts“, some of which can be “users” with log in privileges.
`Company` table ---------------------------------------------- ID Company_Name Address -- ----------------------- ----------------- 1 Acme, Inc. 101 Sierra Vista 2 Spacely Space Sprockets East Mars Colony 3 Cogswell Cogs West Mars Colony 4 Stark Industries Los Angeles, CA
We have four companies in our database.
`Affiliates` table --------------------- ID Company_ID Price Sales -- ---------- ----- ----- 1 1 50 456 2 4 50 222 3 1 75 14
Each company can have multiple affiliate id’s so that they can represent the products at different pricing levels to different markets.
Two of our companies are affiliates (Acme, Inc. and Stark Industries), and Acme has two affiliate ID’s
`Clients` table -------------------------------------- ID Company_ID Referring_affiliate_id -- ---------- ---------------------- 1 2 1 2 3 1 3 4 3
Each company can only be a client once.
Three of our companies are clients (Spacely Space Sprockets, Cogswell Cogs, and Stark Industries, who is also an affiliate)
In all three cases, they were referred to us by Acme, Inc., using one of their two affiliate ID’s
`Contacts` table ----------------------------------------- ID Name Email -- -------------- --------------------- 1 Wylie Coyote wcoyote@acme.com 2 Cosmo Spacely boss@spacely.com 3 H. G. Cogswell ceo@cogs.com 4 Tony Stark tony@stark.com 5 Homer Simpson simpson@burnscorp.com
Each company has at least one contact, but in this table, there is no indication of which company each contact works for, and there’s also an extra contact (#5). We’ll get to that in a moment.
Each of these contacts may or may not have a login account on the system.
`Contacts_type` table
--------------------------------------
contact_id company_id contact_type
---------- ---------- --------------
1 1 Administrative
2 2 Administrative
3 3 Administrative
4 4 Administrative
5 1 Technical
4 2 Technical
Associates a contact with one or more companies.
Each contact is associated with a company, and in addition, contact 5 (Homer Simpson) is a technical contact for Acme, Inc, and contact 4 (Tony Stark) is a both an administrative contact for company 4 (Stark Industries) and a technical contact for company 3 (Cogswell Cogs)
`Users` table ------------------------------------------------------------------------------------- ID contact_id company_id client_id affiliate_id user_id password access_level -- ---------- ---------- --------- ------------ -------- -------- ------------ 1 1 1 1 1 wylie A03BA951 2 2 2 2 2 NULL cosmo BF16DA77 3 3 3 3 3 NULL cogswell 39F56ACD 3 4 4 4 4 2 ironman DFA9301A 2
The users table is essentially a list of contacts that are allowed to login to the system.
Zero or one user per contact; one contact per user.
Contact 1 (Wylie Coyote) works for company 1 (Acme) and is a customer (1) and also an affiliate (1)
Contact 2 (Cosmo Spacely) works for company 2 (Spacely Space Sprockets) and is a customer (2) but not an affiliate
etc…
NOW finally onto the problem, if there is one…
Do I have a circular reference via the client_id and affiliate_id columns in the Users table? Is this a bad thing? I’m having a hard time wrapping my head around this.
When someone logs in, it checks their credentials against the users table and uses users.contact_id, users.client_id, and users.affiliate_id to do a quick look up rather than having to join together a string of tables to find out the same information. But this causes duplication of data.
Without client_id in the users table, I would have to find the following information out like this:
affiliate_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `affiliates`.`company_id` client_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `clients`.`company_id` company_id: join `users`.`contact_id` to `contacts_types`.`company_id` to `company`.`company_id` user's name: join `users`.`contact_id` to `contacts_types`.`contact_id` to `contacts`.`contact_id` > `name`
In each case, I wouldn’t necessarily know if the user even has an entry in the affiliate table or the clients table, because they likely have an entry in only one of those tables and not both.
Is it better to do these kinds of joins and thread through multiple tables to get the information I want, or is it better to have a “shortcut” field to get me the information I want?
I have a feeling that over all, this is overly complicated in some way, but I don’t see how.
I’m using MySQL.
it’s better to do the joins. you should only be denormalizing your data when you have timed evidence of a slow response.
having said that, there are various ways to reduce the amount of typing:
it’s possible mysql doesn’t support all the above – you’ll need to check the docs [update: ok, recent mysql seems to support views, but not “with”. so you can add views to do the work of affiliate_id, client_id etc and treat them just like tables in your queries, but keeping the underlying data nicely organised.]