Here’s a piece of code I’ve spent the last 2 days optimizing and profiling because it was taking too much time:
{
mongo::ScopedDbConnection _dbConnection (DbHost);
_dbConnection->insert(TokensDB, tokensArray );
_dbConnection.done();
}
{
mongo::ScopedDbConnection _dbConnection (DbHost);
_dbConnection->insert(IdxDB, postingsArray);
_dbConnection.done();
}
Here postingsArray is std::vector<BSON (int64_t, int64_t, int64_t, int)>, 20 000 elements. This insert always takes only a couple of milliseconds. tokensArray is std::vector<BSON (int64_t, std::string)>, 5000 elements. This is the odd insert.
If I do it exactly as in the code fragment above, it takes 45-50 ms. But if I switch the two blocks around as it initially was (insert to IdxDB first and TokensDB second) it takes 400-500 ms. What is going on here? Why does order matter? Why is inserting 5000 2-field records taking much longer than inserting 20k 4-field objects?
My initial idea is it’s because of std::string field (it holds single english word, so about 5-7 symbols on average). I’ve replaced it with random int64_t number – no noticeable change in insert completion time.
All the profiling is done on a clean database and with exactly the same data every time, I don’t believe it’s my error in organizing the measurements.
MongoDB performs a lot of things in the background so it is normal that the insertion of the large
postingsArraytakes little time but affects the performance after that. When you measure thepostingsArrayinsert alone you are only measuring the time it takes for the MongoDB driver to accept the insert. But when you measure the consequent operations you begin to notice the background workload started by thepostingsArrayinsert.See point 6 there: http://article.gmane.org/gmane.comp.db.mongodb.user/818
BTW, the way your example written I would suspect MongoDB gives you the same connection for the inserts. (E.g. you might be taking a connection from the pool, inserting the
postingsArraywith it, releasing it, then taking the same connection from the pool again and inserting thetokensArraywith it). In that case the TCP/IP socket might still be busy with thepostingsArrayinsert and what you’re seeing might be hitting the limit on the TCP/IP buffer.P.S. You might want to change the write concern in order to measure the actual time it takes for the MongoDB to perform the insert: http://article.gmane.org/gmane.comp.db.mongodb.user/68288