I found this thread which helps my understanding somewhat, but does not answer my question:
SQL: Using NULL values vs. default values
My Question:
If I am creating a schema (in an MS Access Database) that is designed to store contact information for employees, would it be better to have a single table for telephone numbers, then a single table for addresses, then another single table for email addresses, OR would it be better to have a single table that stores all of these records, but might have NULL values for several of the fields in more than half of the records?
I would like to store the different elements of a street address into separate fields: For Addresses: one field for the street number and name, one filed for the city, one for the state, one for the country, one for the zip code, and also one for any other name for the address (“ATTN:” or similar), and maybe more; For Telephone Numbers: essentially one for a name and one for a number; For Emails: essentially the same as Telephone – name and number. This would leave many NULL/Blank values in the list for telephone numbers… in fact, I would estimate probably 70% of the records would have 5 or more null values, on the scale of 5,000 to 10,000 records.
I would want to be able to display them both in separate lists as well as in a combined list, filtered and grouped. Either structure could support this (through JOINS/UNIONS and WHERE clauses). In terms of simplicity of table structure, a single list would seem obvious – ONE table is “neater” than three or more tables.
The answer, I think, should hinge on the efficiency of “storing” potentially tens of thousands of NULL values vs. the efficiency of indexing different tables, and spending time ensuring UNIONs line up with datatypes and constructing various other methods to combine data that is already SOMEWHAT related.
I hope I have presented my thoughts clearly enough! I welcome links, answers, and comments as well as questions.
I would approach the design with a bias favoring separate tables for each entity class. Person is an entity class. If you have no more than a single phone number for each person, you can make this work to store it as an attribute of the Persons table.
However, what I usually see is the desire for the flexibility to store multiple types of phone numbers for each person: home; work; cell; fax; etc. Storing those in a single table (Person_ID, work_phone, home_phone, cell_phone) leads to a brittle design. When the managers tell you to add a field for another phone number type, you’re forced to revise the table structure, as well as queries, forms, and reports which use that table.
I would lean towards a separate table with one-to-many relationship between People and PhoneNumbers — so that each phone number and its type is a separate row in the PhoneNumbers tables. That design avoids the brittleness of the single table approach. And it also avoids your concern over storing so many Null values — if there is no phone number for a Person, you don’t have a row for that Person in PhoneNumbers.
However I really don’t know whether this suggestion is appropriate for your situation. I think it depends on the complexity of your data needs.
As for the “convenience” of a single table, that seems inconsequential to me. Access is relational, so you use a query to gather up the related pieces from multiple tables into a full view of the data you need … which can resemble a single table. If you’re deliberately avoiding that relational capability, perhaps you wouldn’t lose much by storing your contact information in a spreadsheet instead.