So I’m moving some of my code from sql to mongodb and there are few things that are not yet very clear to me.
Let’s say I have the following simple sql query (just an example)
select count(a.id) as count, b_id
from table group by b_id
where c_id=[SOME ID]
group by b_id
order by count desc;
I assume everyone understands what that does.
Now with mongo I can use several approaches, do it all on mongo side, fetch summed results and sort them client side or just get the raw data to the client side and do all the processing there.
What would be the best approach for the query above, to do it all in the database with some internal mongodb mechanism (mapreduce etc) or fetch the collection to the client side and process it there. The dataset in general will be huge but the query can be split to several parts if necessary.
The client is Java based if that matters.
With the upcoming MongoDB Aggregation Framework it’s pretty easy to do what you need to do. It’s already available in 2.1.x development releases.
If you’re stuck to 2.0 or earlier you’ll have to look at either the options you mention or schema changes to avoid having to do on the spot aggregation in the first place. For example, it’s pretty common in NoSQL to maintain a field or document with the aggregated data as the source data is manipulated. The most common example is maintaining the size of an array as a field :