Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8135697
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T10:30:03+00:00 2026-06-06T10:30:03+00:00

First of all it’s my first time in Mongo… Concept: A user is able

  • 0

First of all it’s my first time in Mongo…

Concept:

  1. A user is able to describe an image in natural language.
  2. Divide the user input and store the words he described in a Collection called
    words.
  3. Users must be able to go through the most used words and add those words to their description.
  4. The system will use the most used words (for all users) and use
    those words to describe the image.

My words document (currently) is as follows (example)

{
"date": "date it was inserted"
"reported": 0,
"image_id": "image id"
"image_name": "image name"
"user": "user _id"
"word": "awesome"
}

The words will be duplicated so that each word can be associated to a user…

Problem: I need to perform a Mongo query to allow me to know the most used words (to describe an image) that were not created by a given user. (to meet point 3. above)

I’ve seen MapReduce algorithm, but from what I read there are a couple of issues with it:

  1. Can’t sort results (I can order from the most used to less used)
  2. In millions of documents it can have a large processing time.
  3. Can’t limit the number of the results returned

I’ve thought about running a task at a given time each day to store on a document (in a different collection) the list the rank of words that a given user hasn’t used to describe the given image. I would have to limit this to 300 results or something (any idea on a proper limit??) Something like:

{
user_id: "the user id"
[
{word: test, count: 1000},
{word: test2, count: 980},
{word: etc, count: 300}
]
}

Problems I see with this solution are:

  1. Results would have quite a delay which is not desirable.
  2. Server loads while generating this documents for all users can spike (I actually know very little about this in Mongo so this is just an assumption)

Maybe my approach doesn’t make any sense… And maybe my lack of experience in Mongo is pointing me at the wrong “schema design”.

Any idea of what could be a good approach for this kind of problem?

Sorry for the big post and thanks for your time and help!

João

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T10:30:04+00:00Added an answer on June 6, 2026 at 10:30 am

    As already mentioned you could use the group command which is easy to use, but you will need to sort the result on the client side. Also the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception.

    Code example based on your data structure:

    db.words.group({
        key : {"word" : true},
        initial: {count : 0},
        reduce: function(obj, prev) { prev.count++},
        cond: {"user" :{ $ne : "USERNAME_TO_IGNORE"}}
    })
    

    Another option is to use the new Aggregation framework, which will be released in the 2.2 version. Something like that should work.

    db.words.aggregate({
       $match : { "user" : { "$ne" : "USERNAME_TO_IGNORE"} },
       $group : {
         _id : "$word",
         count: { $sum : 1}
       }
    })
    

    Or you can still use MapReduce. Actually you can limit and sort the output, because the result is
    an collection. Just use .sort() and .limit() on the output. Also you can use the incremental
    map-reduce output option, which will help you solve your performance issues. Have a look at the out parameter in MapReduce.

    Bellow it’s an example, which use the incremental feature to merge the existing collection with new data in a words_usage collection:

    m = function() { 
       emit(this.word, {count: 1}); 
    };
    
    
    r = function( key , values ){
         var sum = 0;
         values.forEach(function(doc) {
              sum += doc.count;
         });
         return {count: sum};
     };
    
    db.runCommand({
        mapreduce : "words", 
        map : m,
        reduce : r,
        out : { reduce: "words_usage"},
        query : <query filter object>
    })
    
    # retrieve the top 10 words
    db.words_usage.find().sort({"value.count" : -1}).sort({"value.count" : -1}).limit(10)
    

    I guess you can run the above MapReduce command in a cron every few minutes/hours, depends how accurate results you want. For the update query criteria you can use the words documents creation date.

    Once you have the system top words collection you can build per user top words or just compute them in real time (depends on the system size).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

First of all, this isn't for a keylogger, it's for an input in a
First of all, I just installed XE2 for the first time, and plan to
First of all: I am not an experienced ClearCase user, but I have lots
First of all, I'm not a keen user of C/C++ but I've got one
first of all some details: I configured security as below in web.xml view plaincopy
First of all, I'm quite new to the Android and JAVA world (coming from
first of all i would like to say i know its probably an easy
First of all there is probably a question like this already but i couldn't
First of all, apologize because I have seen some posts about this, but I
First of all I want to mention two things, One: My code isn't perfect

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.