We have a collection of log data, where each document in the collection is identified by a MAC address and a calendar day. Basically:
{
_id: <generated>,
mac: <string>,
day: <date>,
data: [ "value1", "value2" ]
}
Every five minutes, we append a new log entry to the data array within the current day’s document. The document rolls over at midnight UTC when we create a new document for each MAC.
We’ve noticed that IO, as measured by bytes written, increases all day long, and then drops back down at midnight UTC. This shouldn’t happen because the rate of log messages is constant. We believe that the unexpected behavior is due to Mongo moving documents, as opposed to updating their log arrays in place. For what it’s worth, stats() shows that the paddingFactor is 1.0299999997858227.
Several questions:
- Is there a way to confirm whether Mongo is updating in place or moving? We see some moves in the slow query log, but this seems like anecdotal evidence. I know I can
db.setProfilingLevel(2), thendb.system.profile.find(), and finally look for"moved:true", but I’m not sure whether it’s ok to do this on a busy production system. - The size of each document is very predictable and regular. Assuming that mongo is doing a lot of moves, what’s the best way to figure out why isn’t Mongo able to presize more accurately? Or to make Mongo presize more accurately? Assuming that the above description of the problem is right, tweaking the padding factor does not seem like it would do the trick.
- It should be easy enough for me to presize the document and remove any guesswork from Mongo. (I know the padding factor docs say that I shouldn’t have to do this, but I just need to put this issue behind me.) What’s the best way to presize a document? It seems simple to write a document with a garbage byte array field, and then immediately remove that field from the document, but are there any gotchas that I should be aware of? For example, I can imagine having to wait on the server for the write operation (i.e. do a safe write) before removing the garbage field.
- I was concerned about preallocating all of a day’s documents at around the same time because it seems like this would saturate the disk at that time. Is this a valid concern? Should I try to spread out the preallocation costs over the previous day?
The following combination seems to cause write performance to fall off a cliff:
Presumably I/O becomes saturated. Changing either of these factors seems to prevent this from happening:
In addition, here are some other tricks that improve write throughput. With the exception of sharding, we found the improvements to be incremental, whereas we were trying to solve a “this doesn’t work at all” kind of problem, but I’m including them here in case you’re looking for incremental improvements. The 10Gen folks did some testing and got similar results:
{"0":[...], "1":[...],...,"23":[...]}.You’ll notice that I’ve copied some of the suggestions from 10Gen here, just for completeness. Hopefully I did so accurately! If they publish a cookbook example, then I’ll post a link here.