Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8760643
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T15:05:58+00:00 2026-06-13T15:05:58+00:00

Regarding my previous question ( How to quickly build large scale analytics server? )

  • 0

Regarding my previous question ( How to quickly build large scale analytics server?
) I’m now feeding my analytics data to MongoDB – Every event (with bunch of metadata) gets their own document in views collection.

However, now I’ve hit the next roadblock: Now that inserts are done and analytics data is flowing, what would be the path of least resistance to do runs against that data? The idea is that once that data is sharded, the specific views would run mapReduces (say all events with specific ID one month back).

So, my question is: As I’m quite new with MongoDB, what are the steps I would need to do to get those mapReduces as fast as possible? Should I structure the raw data differently or is the one document per event correct? Are there Mongo specific tricks that I can do to make things flow faster when running against dataset that gets millions of inserts per day?

I would prefer to keep my technology stack as simple as possible (Node.js + MongoDB), so I’d prefer if things could be done without introducing extra technology (like say Hadoop).

Example document of a event:

{
    id: 'abc',
    ip: '1.1.1.1',
    type: 'event1',
    timestamp: 1234,
    metadata: {
        client: 'client1'
    }
}

All main aggregations will be ID centric, analysing events in said ID, the most used being get all events with said ID for the last month. Minor aggregations would be grouping things with metadata (how many percents used client1 vs. client2 etc.). All the aggregations will be system defined, so users can’t set them by themselves, at this point at least. Thus, as far as I understand, the sharding should be done through ID, as big majority of the aggregations will be ID centric? Also, this should mean that the most recent events on any given ID are always in memory as Mongo keeps the latest stuff in memory and only dumps the overflow to disk.

Also, real time is not a requirement. Although, ofc it would be nice. 😛

Edit: Added example data

Edit: Title should have been “…per day” not “…per page” + more spec about the aggregation sets

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T15:06:00+00:00Added an answer on June 13, 2026 at 3:06 pm

    To expand on the other answer with my own flavour; map reduce is a solid option for a lot of your processing. Make no mistake, map reduce will be a vital core of your analytics if you do it fully.

    The best sort of MR is an incremental one which builds things like archival statistics on certain events and people so as to shrink the overall database size and working set size of pulling out all that old, dusty data.

    As for realtime usage, I have found (from many dicussions on the subject in early days of MongoDB in the Google User group) the best way is to pre-aggregate your data so that a simple linear query like: db.some_preaggregated_data.find({date: {$gt, $lt}}) will get your results. This eases the sharding of the data and also the ranging over it, not to mention your working set size. Overall giving a much more performant operation.

    I would recommend one thing:- don’t go into the aggregation framework or complex queries if you really expect this to scale fully in realtime. It will start to create too much work on such a hugely expansive data set. You will need the full lock, working set etc on your side using simple find() queries to satisfy your needs across a big table format normally.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This is related to my previous question , regarding pulling objects from a dmp
Following a previous question regarding ActiveMQ and WebSockets, I would like to know if
Following my previous question regarding the rationale behind extremely long functions, I would like
This post follows a previous question regarding the restructuring of a matrix: re-formatting a
In a follow-up to a previous question regarding exceptions, what are best practices for
I'm new to JQuery, but as a follow on from my previous question regarding
After a previous question on stackoverflow regarding async / await it seemed to me
Following from my previous question regarding OpenRasta authentication, I'd like to know if NTLM
This is a generalized version of a previous question regarding Sphinx . Is there
noob question regarding ios development. In a previous project, I have a UITableViewController, to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.