How do we do the analysis of insertion at the back (push_back) in a std::vector? It’s amortized time is O(1) per insertion. In particular in a video in channel9 by Stephan T Lavavej and in this ( 17:42 onwards ) he says that for optimal performance Microsoft’s implementation of this method increases capacity of the vector by around 1.5.
How is this constant determined?
Assuming you mean
push_backand not insertion, I believe that the important part is the multiply by some constant (as opposed to grabbing N more elements each time) and as long as you do this you’ll get amortized constant time. Changing the factor changes the average case and worst case performance.Concretely:
If your constant factor is too large, you’ll have good average case performance, but bad worst case performance especially as the arrays get big. For instance, imagine doubling (2x) a 10000 size vector just because you have the 10001th element pushed. EDIT: As Michael Burr indirectly pointed out, the real cost here is probably that you’ll grow your memory much larger than you need it to be. I would add to this that there are cache issues that affect speed if your factor is too large. Suffice it to say that there are real costs (memory and computation) if you grow much larger than you need.
However if your constant factor is too small, say (1.1x) then you’re going to have good worst case performance, but bad average performance, because you’re going to have to incur the cost of reallocating too many times.
Also, see Jon Skeet’s answer to a similar question previously. (Thanks @Bo Persson)
A little more about the analysis: Say you have
nitems you are pushing back and a multiplication factor ofM. Then the number of reallocations will be roughly log baseMofn(log_M(n)). And theith reallocation will cost proportional toM^i(Mto theith power). Then the total time of all the pushbacks will beM^1 + M^2 + ... M^(log_M(n)). The number of pushbacks isn, and thus you get this series (which is a geometric series, and reduces to roughly(nM)/(M-1)in the limit) divided byn. This is roughly a constant,M/(M-1).For large values of
Myou will overshoot a lot and allocate much more than you need reasonably often (which I mentioned above). For small values ofM(close to 1) this constantM/(M-1)becomes large. This factor directly affects the average time.