Apparently MapReduce queries are one of the slowest things one can do in MongoDB according to this and this articles.
-
If the dataset is large then isn’t it still better to do MapReduce instead of sending the entire dataset over to client and have it processed there?
-
Do MapReduce queries lock the database and therefore stop it from responding to other requests?
-
I find MapReduce really logical and easy to understand while the Aggregation Framework in version 2.1 appears a little overwhelming! Is this MongoDB’s response to not having a performant MapReduce facility and therefore suggestion to move away from MapReduce altogether?
Depends on your definition of “large”, but I’d still choose running map-reduce (on a secondary, so that primary is not blocked)
Map-reduce jobs take lots of short-lived locks.
2.1. They take read locks when they read data, this doesn’t block anything.
2.2. They take write locks when they write data (to temp or final collection). This block other operations.
2.3. They take JS lock whenever they need to execute javascript. Which they do execute all the time, because map and reduce functions are in javascript. Here’s a typical sequence in map phase: “take read lock, fetch document from input collection, release read lock, take js lock, apply map function to that document, release js lock, take write lock, write an entry to temp collection, release write lock”.
Yes, basic idea of map-reduce is simple. But I find aggregation pipeline as simple. Maybe even simpler. “Take this data, apply this array of transformations to it, I’ll take the result”. What could be simpler?