Can it be said what is best in a general case (where the database size is really big): To have a MongoDB cluster consisting of a larger number of smaller blade servers, or a few, really fat, servers?
Given is that the shard key has a quite fine granularity, so splitting should not be a problem.
If there are no “golden bullet”, what is the pros and cons with either setup?
Best in what aspect? From a financial point of view, I’d go for lots of cheap hardware 🙂
MongoDB has been built to easily scale across nodes, so why not take advantage of this? The reason you’d want just one or a few beefy servers for a SQL server is to minimize the spread of relational data across physical nodes. But since MongoDB uses documents, most of your related data is stored in a single document. This means that it’s all stored at the same physical location and you don’t have to do costly lookups on other nodes to reconstruct the ‘complete picture’ of your data.
Another thing to keep in mind is that map-reduce jobs can only run in parallel in a sharded environment. So if you plan to do a lot of map-reducing, more shards/servers will result in better performance.
What if your database outgrows your beefy servers? Are you going to invest in another beefy server that handles that small amount of extra growth? Or what if one of them crashes? With smaller and cheaper servers, you can scale up (or down) more gradually if the need arises. Also, the impact of a server crash is much smaller, as it will affect only a small portion of your data.
To summarize: a large cluster of smaller servers isn’t a silver bullet, as managing such a cluster has its own challenges, but it is significantly cheaper and possibly faster as well if you’re doing map-reduce.