I need to make connections between several tables. And I need to decide whether I should make several separate tables with only two ids or one table with all the connections.
The tables are:
- movie – 50000 rows
- photo_set – 50000 rows – each photo set is related to a movie, but there can be some photosets without movies or movies without photosets
- dvd – 13000 rows – there can be obly 3-5 movies on each DVD, some movies are not connected to any DVD
- category – 50 rows – there can be 1-5 categories for each movie. Category for a all movies and their photosets in one DVD are the same
- actor – 15000 rows – there can be 2-5 actors in one movie
The question is: how to calculate the number of rows in a table with such fields?
movie_id (default=NULL)photo_set_id (default=NULL)dvd_id (default=NULL)category_idactor_id
I think that if I will use only one table with all the connections it might be much faster to get all the connections in one query. Of course if this one table will be not very much larger than several connection tables.
If this is not a good idea to use one table, please tell me, what will be the advantage of using several connection tables in my scenario?
You should make several connection tables, one for each many-to-many relationship.
The reason for having several connection tables is the same as for having several data tables.
You could store movies, photos and so on in the same table. But you don’t, because when designing your database you strive for the third normal form.
Each kind of object should have it’s own table.
The relation between movie and actor is not the same as the relation between movie and photoset. Let’s say that you want to add attributes to the relationships. Maybe you want to store which character the actor played in he movie, or maybe you want to name the photosets (“Pictures from the Oscar awards”, “Screenshots from the movie” etc).
My point is that it will be easier to extend the application if you make one table per relationship.
Estimating the number of rows:
The number of rows in each table will be the number of rows in the driving table times the average number of connected items.
Example:
Let’s assume the average number of actors is 3 per movie.
Then there will be 50000 * 3 = 150000 rows in the movie_actor table.