Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7428019
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T08:40:39+00:00 2026-05-29T08:40:39+00:00

I’m building an application that stores lots of data per user (possibly in gigabytes).

  • 0

I’m building an application that stores lots of data per user (possibly in gigabytes).

Something like a request log, so lets say you have the following fields for every record:

customer_id
date
hostname
environment
pid
ip
user_agent
account_id
user_id
module
action
id
response code
response time (range)

and possibly some more.

The good thing is that the usage will be mostly write only, but when there are reads
I’d like to be able to answer then quickly in near real time.

Another prediction about the usage pattern is that most of the time people will be looking at the most recent data,
and infrequently query for the past, aggregate etc, so my guess is that the working set will be much smaller then
the whole database, i.e. recent data for most users and ranges of history for some users that are doing analytics right now.
for the later case I suppose its ok for first query to be slower until it gets the range into memory.

But the problem is that Im not quite sure how to effectively index the data.

The start of the index is clear, its customer_id and date. but the rest can be
used in any combination and I can’t predict the most common ones, at least not with any degree of certainty.

We are currently prototyping this with mongo. Is there a way to do it in mongo (storage/cpu/cost) effectively?

The only thing that comes to mind is to try to predict a couple of frequent queries and index them and just massively shard the data
and ensure that each customer’s data is spread evenly over the shards to allow fast table scan over just the ‘customer, date’ index for the rest
of the queries.

P.S. I’m also open to suggestions about db alternatives.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T08:40:40+00:00Added an answer on May 29, 2026 at 8:40 am

    with this limited number of fields, you could potentially just have an index on each of them, or perhaps in combination with customer_id. MongoDB is clever enough to pick the fastest index for each case then. If you can fit your whole data set in memory (a few GB is not a lot of data!), then this all really doesn’t matter.

    You’re saying you have a GB per user, but that still means you can have an index on the fields as there are only about a dozen. And with that much data, you want sharding anyway at some point soon.

    cheers,
    Derick

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I've got a string that has curly quotes in it. I'd like to replace
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I would like to count the length of a string with PHP. The string
For some reason, after submitting a string like this Jack’s Spindle from a text
I have a French site that I want to parse, but am running into
I am doing a simple coin flipping experiment for class that involves flipping a
I am trying to render a haml file in a javascript response like so:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.