Let’s say I have a table which represents Users:id, name. The table is huge, about 100 million rows.
Also users have some property, lets say City of birth. This is optional field so only a small part of users (let’s say only 5%) have provided it. So I also have a table with cities: id, name. Relation is 1 to many – user can have only one city, and the a city can be a bithplace for many users.
The question is: how to connect them?
a) Adding a column city_id to the users table. (doomed for 95 millions nulls for users who don’t have the property)
b) Creating a third, conjunction table user_city: user_id,city_id (With purpose to omit that huge number of NULLs if a).
Also, please, keep in mind that the application needs to
select user.name ... where city_id=xxx
So the city_id column needs to be indexed in any case
Because any non-alien user has only one birth city (unless he was born in a taxi), it seems silly and wasteful to have a table of birth city indexed by User ID. I would put birth city right in the user table where (as I claim) it belongs, notwithstanding that most city fields will be NULL.
But, forgetting my mere opinion, this is the classic time vs. space problem, the space consideration being the millions of extraneous, useless NULLs; and the extra time being the millions of extraneous, useless SELECTs into the city table.
What does your solution to that problem tell you?