Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6852945
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T01:27:08+00:00 2026-05-27T01:27:08+00:00

We have a collection of log data, where each document in the collection is

  • 0

We have a collection of log data, where each document in the collection is identified by a MAC address and a calendar day. Basically:

{
  _id: <generated>,
  mac: <string>,
  day: <date>,
  data: [ "value1", "value2" ]
}

Every five minutes, we append a new log entry to the data array within the current day’s document. The document rolls over at midnight UTC when we create a new document for each MAC.

We’ve noticed that IO, as measured by bytes written, increases all day long, and then drops back down at midnight UTC. This shouldn’t happen because the rate of log messages is constant. We believe that the unexpected behavior is due to Mongo moving documents, as opposed to updating their log arrays in place. For what it’s worth, stats() shows that the paddingFactor is 1.0299999997858227.

Several questions:

  1. Is there a way to confirm whether Mongo is updating in place or moving? We see some moves in the slow query log, but this seems like anecdotal evidence. I know I can db.setProfilingLevel(2), then db.system.profile.find(), and finally look for "moved:true", but I’m not sure whether it’s ok to do this on a busy production system.
  2. The size of each document is very predictable and regular. Assuming that mongo is doing a lot of moves, what’s the best way to figure out why isn’t Mongo able to presize more accurately? Or to make Mongo presize more accurately? Assuming that the above description of the problem is right, tweaking the padding factor does not seem like it would do the trick.
  3. It should be easy enough for me to presize the document and remove any guesswork from Mongo. (I know the padding factor docs say that I shouldn’t have to do this, but I just need to put this issue behind me.) What’s the best way to presize a document? It seems simple to write a document with a garbage byte array field, and then immediately remove that field from the document, but are there any gotchas that I should be aware of? For example, I can imagine having to wait on the server for the write operation (i.e. do a safe write) before removing the garbage field.
  4. I was concerned about preallocating all of a day’s documents at around the same time because it seems like this would saturate the disk at that time. Is this a valid concern? Should I try to spread out the preallocation costs over the previous day?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T01:27:09+00:00Added an answer on May 27, 2026 at 1:27 am

    The following combination seems to cause write performance to fall off a cliff:

    1. Journaling is on.
    2. Writes append entries to an array that makes up the bulk of a larger document

    Presumably I/O becomes saturated. Changing either of these factors seems to prevent this from happening:

    1. Turn journaling off. Use more replicas instead.
    2. Use smaller documents. Note that document size here is measured in bytes, not in the length of any arrays in the documents.
    3. Journal on a separate filesystem.

    In addition, here are some other tricks that improve write throughput. With the exception of sharding, we found the improvements to be incremental, whereas we were trying to solve a “this doesn’t work at all” kind of problem, but I’m including them here in case you’re looking for incremental improvements. The 10Gen folks did some testing and got similar results:

    1. Shard.
    2. Break up long arrays into several arrays, so that your overall structure looks more like a nested tree. If you use hour of the day as the key, then the daily log document becomes:
      {"0":[...], "1":[...],...,"23":[...]}.
    3. Try manual preallocation. (This didn’t help us. Mongo’s padding seems to work as advertised. My original question was misguided.)
    4. Try different –syncdelay values. (This didn’t help us.)
    5. Try without safe writes. (We were already doing this for the log data, and it’s not possible in many situations. Also, this seems like a bit of a cheat.)

    You’ll notice that I’ve copied some of the suggestions from 10Gen here, just for completeness. Hopefully I did so accurately! If they publish a cookbook example, then I’ll post a link here.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large collection of unique strings (about 500k). Each string is associated
I have a Mongo collection where each document has a set of unique embedded
I have Collection List<Car> . How to compare each item from this collection with
I'm planning to have collection of items stored in a TCollection. Each item will
I have a collection of models, each of which is attached a view. The
I have a data set representing data from a log file which shows users
I have a collection of objects and I need to log their properties to
for example I have collection foo of documents like that: {tag_cloud:[{value:games, count:10}, {value:girls, count:500}]}
I have a collection of my custom entity that is bound to the listpicker
I have a Collection View Source (CVS) implemented much like you see in MSDN

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.