JOIN is a handy feature of SQL databases, but what about large databases (>10GB). Consider three (two-column) tables of many-to-many relationship, as we want to get items associated to one single case (e.g. tags of ONE article).
FACTS (correct me if I’m wrong):
1. For JOIN, three tables should fit within the memory.
2. Single SELECT by PRIMARY KEY does not consume memory.
3. When we have concurrent many concurrent read connections, excess connection will be kept in queue (not making unsuccessful request or overload).
Then, isn’t it better to perform three simple SELECT queries. This makes the system a little bit slower, but I believe it is more efficient to deal with the entire tables of Gigabyte size.
One may suggest that adding more is the ultimate solution; but I think still handling such large tables is not easy with excess RAM.
Limiting actions to simple SELECT queries with PRIMARY KEY can be a practical approach to work with large databases efficiently.
If you are claiming that it’s better to do the selects on three separate tables, then join the data yourself outside of the database engine, then you are wrong. The database will do a better job joining your queries than you can. The tables don’t all have to fit into RAM for a join to work.