Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8334727
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T03:26:51+00:00 2026-06-09T03:26:51+00:00

I have gone through several articles and examples, and have yet to find an

  • 0

I have gone through several articles and examples, and have yet to find an efficient way to do this SQL query in MongoDB (where there are millions of rows documents)

First attempt

(e.g. from this almost duplicate question – Mongo equivalent of SQL's SELECT DISTINCT?)

db.myCollection.distinct("myIndexedNonUniqueField").length

Obviously I got this error as my dataset is huge

Thu Aug 02 12:55:24 uncaught exception: distinct failed: {
        "errmsg" : "exception: distinct too big, 16mb cap",
        "code" : 10044,
        "ok" : 0
}

Second attempt

I decided to try and do a group

db.myCollection.group({key: {myIndexedNonUniqueField: 1},
                initial: {count: 0}, 
                 reduce: function (obj, prev) { prev.count++;} } );

But I got this error message instead:

exception: group() can't handle more than 20000 unique keys

Third attempt

I haven’t tried yet but there are several suggestions that involve mapReduce

e.g.

  • this one how to do distinct and group in mongodb? (not accepted, answer author / OP didn’t test it)
  • this one MongoDB group by Functionalities (seems similar to Second Attempt)
  • this one http://blog.emmettshear.com/post/2010/02/12/Counting-Uniques-With-MongoDB
  • this one https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/trDn3jJjqtE
  • this one http://cookbook.mongodb.org/patterns/unique_items_map_reduce/

Also

It seems there is a pull request on GitHub fixing the .distinct method to mention it should only return a count, but it’s still open: https://github.com/mongodb/mongo/pull/34

But at this point I thought it’s worth to ask here, what is the latest on the subject? Should I move to SQL or another NoSQL DB for distinct counts? or is there an efficient way?

Update:

This comment on the MongoDB official docs is not encouraging, is this accurate?

http://www.mongodb.org/display/DOCS/Aggregation#comment-430445808

Update2:

Seems the new Aggregation Framework answers the above comment… (MongoDB 2.1/2.2 and above, development preview available, not for production)

http://docs.mongodb.org/manual/applications/aggregation/

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T03:26:53+00:00Added an answer on June 9, 2026 at 3:26 am

    1) The easiest way to do this is via the aggregation framework. This takes two “$group” commands: the first one groups by distinct values, the second one counts all of the distinct values

    pipeline = [ 
        { $group: { _id: "$myIndexedNonUniqueField"}  },
        { $group: { _id: 1, count: { $sum: 1 } } }
    ];
    
    //
    // Run the aggregation command
    //
    R = db.runCommand( 
        {
        "aggregate": "myCollection" , 
        "pipeline": pipeline
        }
    );
    printjson(R);
    

    2) If you want to do this with Map/Reduce you can. This is also a two-phase process: in the first phase we build a new collection with a list of every distinct value for the key. In the second we do a count() on the new collection.

    var SOURCE = db.myCollection;
    var DEST = db.distinct
    DEST.drop();
    
    
    map = function() {
      emit( this.myIndexedNonUniqueField , {count: 1});
    }
    
    reduce = function(key, values) {
      var count = 0;
    
      values.forEach(function(v) {
        count += v['count'];        // count each distinct value for lagniappe
      });
    
      return {count: count};
    };
    
    //
    // run map/reduce
    //
    res = SOURCE.mapReduce( map, reduce, 
        { out: 'distinct', 
         verbose: true
        }
        );
    
    print( "distinct count= " + res.counts.output );
    print( "distinct count=", DEST.count() );
    

    Note that you cannot return the result of the map/reduce inline, because that will potentially overrun the 16MB document size limit. You can save the calculation in a collection and then count() the size of the collection, or you can get the number of results from the return value of mapReduce().

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

There are several other questions about this topic that I have gone through, but
Have gone through several questions on this topic at SO, and am unable to
I have gone through this to change the color of my title bar. I'm
I have gone through this page... http://www.php.net/manual/en/function.hash.php MD5 is 32 characters long while sha1
I have gone through the fadein/fadeout tutorial that is on this page: For the
I have gone through many posts on SO regarding this issue: Tried everything in
I have gone through different questions/articles on Message Brokers and ESBs(Even on stackoverflow). Still
I have gone through several tutorials on fragments and I can't get my queries
I have gone through the several samples in the web.I understood to protect software
I am starting to learn python. I have gone through several tutorials and now

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.