In StackOverflow podcast no. 19, Joe describe Fogcreek’s decision to have one database PER client instead of one database for ALL clients. That kinda sets me thinking about the following.
- Assuming I have 1000 users.
- Each user has 100 clients.
- Each client has 1000 products.
So that means I’m gonna have 1000 x 100 x 1000 = 100,000,000 products associated with users. Now if I do a join tables query for a user and all his client’s products, what should be a reasonable amount of query time if I use just a single database for this purpose?
UPDATE
Maybe I wasn’t clear enough in my question. Assume I need to do all sorts of funky queries (min, max, group, etc.) with the datasets as describe from above, would it be slow (or not) to the point that it makes better sense to have multiple database strategy eg. 1 DB/client, database sharding, etc.
I imagine the answer depends on your choice of DBMS. With Oracle, for example, 1 big database would definitely be preferable, in fact 1000 identical databases would be condsidered absurd and unmanageable.
Also, would you never have a need to run queries across users? e.g. find the user with the most products. Or are these really 1000 discrete ‘private’ databases and no one has overall access to the data? Even then, Oracle for example offers ‘Virtual Private Database’ to cater for that in a single database.