We have a collection of log data, where each document in the collection is

Question

0

Asked: May 27, 20262026-05-27T01:27:08+00:00 2026-05-27T01:27:08+00:00

We have a collection of log data, where each document in the collection is

0

We have a collection of log data, where each document in the collection is identified by a MAC address and a calendar day. Basically:

{
  _id: <generated>,
  mac: <string>,
  day: <date>,
  data: [ "value1", "value2" ]
}

Every five minutes, we append a new log entry to the data array within the current day’s document. The document rolls over at midnight UTC when we create a new document for each MAC.

We’ve noticed that IO, as measured by bytes written, increases all day long, and then drops back down at midnight UTC. This shouldn’t happen because the rate of log messages is constant. We believe that the unexpected behavior is due to Mongo moving documents, as opposed to updating their log arrays in place. For what it’s worth, stats() shows that the paddingFactor is 1.0299999997858227.

Several questions:

Is there a way to confirm whether Mongo is updating in place or moving? We see some moves in the slow query log, but this seems like anecdotal evidence. I know I can db.setProfilingLevel(2), then db.system.profile.find(), and finally look for "moved:true", but I’m not sure whether it’s ok to do this on a busy production system.
The size of each document is very predictable and regular. Assuming that mongo is doing a lot of moves, what’s the best way to figure out why isn’t Mongo able to presize more accurately? Or to make Mongo presize more accurately? Assuming that the above description of the problem is right, tweaking the padding factor does not seem like it would do the trick.
It should be easy enough for me to presize the document and remove any guesswork from Mongo. (I know the padding factor docs say that I shouldn’t have to do this, but I just need to put this issue behind me.) What’s the best way to presize a document? It seems simple to write a document with a garbage byte array field, and then immediately remove that field from the document, but are there any gotchas that I should be aware of? For example, I can imagine having to wait on the server for the write operation (i.e. do a safe write) before removing the garbage field.
I was concerned about preallocating all of a day’s documents at around the same time because it seems like this would saturate the disk at that time. Is this a valid concern? Should I try to spread out the preallocation costs over the previous day?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T01:27:09+00:00

The following combination seems to cause write performance to fall off a cliff:

Journaling is on.
Writes append entries to an array that makes up the bulk of a larger document

Presumably I/O becomes saturated. Changing either of these factors seems to prevent this from happening:

Turn journaling off. Use more replicas instead.
Use smaller documents. Note that document size here is measured in bytes, not in the length of any arrays in the documents.
Journal on a separate filesystem.

In addition, here are some other tricks that improve write throughput. With the exception of sharding, we found the improvements to be incremental, whereas we were trying to solve a “this doesn’t work at all” kind of problem, but I’m including them here in case you’re looking for incremental improvements. The 10Gen folks did some testing and got similar results:

Shard.
Break up long arrays into several arrays, so that your overall structure looks more like a nested tree. If you use hour of the day as the key, then the daily log document becomes:
{"0":[...], "1":[...],...,"23":[...]}.
Try manual preallocation. (This didn’t help us. Mongo’s padding seems to work as advertised. My original question was misguided.)
Try different –syncdelay values. (This didn’t help us.)
Try without safe writes. (We were already doing this for the log data, and it’s not possible in many situations. Also, this seems like a bit of a cheat.)

You’ll notice that I’ve copied some of the suggestions from 10Gen here, just for completeness. Hopefully I did so accurately! If they publish a cookbook example, then I’ll post a link here.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We have a collection of log data, where each document in the collection is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply