What is an example of data that is “predistilled or aggregated in runtime”? (And why isn’t MongoDB very good with it?)
This is a quote from the MongoDB docs:
Traditional Business Intelligence. Data warehouses are more suited to new, problem-specific BI databases. However note that MongoDB can work very well for several reporting and analytics problems where data is pre-distilled or aggregated in runtime — but classic, nightly batch load business intelligence, while possible, is not necessarily a sweet spot.
Let’s take something simple like counting clicks. There are a few ways to report on clicks.
Now most big systems use #2. There are several systems that are very good for this specifically (see Hadoop).
#3 is difficult to do with SQL databases (like MySQL), because there’s a lot of disk locking happening. However, MongoDB isn’t constantly locking the disk and tends to have much better write throughput.
So MongoDB ends up being very good for such “real-time counters”. This is what they mean by
predistilled or aggregated in runtime.But if MongoDB has great write throughput, shouldn’t it be good at doing batch jobs?
In theory, this may be true and MongoDB does support Map/Reduce. However, MongoDB’s Map/Reduce is currently quite slow and not on par with other Map/Reduce engines like Hadoop. On top of that, the Business Intelligence (BI) field is filled with many other tools that are very specific and likely better-suited than MongoDB.