I need some best-practice and performance advice.
Let’s say I have three tables: Employees, Jobs and Ranks.
Every employee has a job and a rank, so obviously I should reference those tables in my Employees table.
My question is, which of these options is best:
1) Each job and rank are stored with a unique ID paired a descriptive name. The Employees table should reference the unique ID in the other table, thus saving memory (the descriptive name is only saved once in the Jobs/Ranks table), but to see the descriptive names I’ll need to do JOINs:
SELECT Employees.EMPL_ID, Ranks.R_NAME, Jobs.J_NAME
FROM Jobs
JOIN Ranks ON Ranks.R_ID=Employees.RANK
JOIN Jobs ON Jobs.J_ID=Employees.JOB
2) Just unique descriptive names. It can be a waste of memory, because i repeatedly save the descriptive name of each rank / job, but I save time on my SELECT statements
<EDIT:>
Just to clarify, my main concern is the performance I’ll have to deal with if I’ll need to perform SELECTs with multiple JOINs instead of one SELECT statement.
I want to be able to deal with lots of traffic – specifically, Employees requests to see their Job and Rank.
<EDIT>
Examples:
Option 1 (IDs and names):
Employees:
__________________________
/ EMPL_ID | RANK | JOB \
| 1 | 2 | 3 |
| 1 | 1 | 3 |
| 1 | 1 | 1 |
\__________|________|______/
Ranks:
__________________
/ R_ID | R_NAME \
| 1 | GRUNT |
| 2 | BOSS |
\________|_________/
Jobs:
____________________
/ J_ID | J_NAME \
| 1 | JANITOR |
| 3 | PRESIDENT |
\________|___________/
Option 2 (unique names):
Employees:
_______________________________
/ EMPL_ID | RANK | JOB \
| 1 | BOSS | PRESIDENT |
| 1 | GRUNT | PRESIDENT |
| 1 | GRUNT | JANITOR |
\__________|________|___________/
Ranks:
__________
/ R_NAME \
| GRUNT |
| BOSS |
\__________/
Jobs:
___________
/ J_NAME \
| JANITOR |
| PRESIDENT |
\___________/
Yes always give each row a unique id.
Best Practice it to always have this for each table.
Usually called ‘id’ or the-table-name_id’
It should have no business value.
Many ‘guaranteed unique’ records later find the need or presence or duplicate records and always having a unique primary key helps hugely when this is met / discovered.
One example of ‘unique’… that isn’t…. if a system has people’s Social Security Numbers they should be unique. However one could be mistyped. Then when the person with the ‘mistyped’ value presents and their number is tied typed in… In allowing / resolving this is will be really helpful for all rows to have their own id that is not the ssn and has no business value at all other than identifying the row.
Unique records is a very well known problem. Having a unique ID for all records is part of the solutions that address it.
The exception to all of the above is performance. I am not too concerned about the join speed for a few thousand records as SQL databases are well designed for speed in doing that. I have found the advantage of unique identification out-weighs disadvantages. There may be cases where you change the above practice due to performance requirements. For instance if there are millions of records that have to be loaded into memory, the overhead of unique ID’s space may become an issue. Often if these cases though folks start to look at no-sql solutions like Redis, MongoDB, etc.
Here are some additional references on SO and other sites:
What's the best practice for primary keys in tables?
in general, should every table in a database have an identity field to use as a PK?
http://www.sql-server-performance.com/forum/threads/do-i-need-a-unique-identifier-or-identity-column.16910/
is an ID column really needed in SQL?
As well commented in one answer “use of natural vs. surrogate keys in kind of a religious debate in the community’. Also there’s a comment about how the answerer got their ‘rules’… tee-hee…