I’ve been comparing a STL implementation of a popular XmlRpc library with an implementation that mostly avoids STL. The STL implementation is much slower – I got 47s down to 4.5s. I’ve diagnosed some of the reasons: it’s partly due to std::string being mis-used (e.g. the author should have used ‘const std::string&’ wherever possible – don’t just use std::string’s as if they were Java strings), but it’s also because copy constructors were being constantly called each time the vector outgrew its bounds, which was exceedingly often. The copy constructors were very slow because they did deep-copies of trees (of XmlRpc values).
I was told by someone else on StackOverflow that std::vector implementations typically double the size of the buffer each time they outgrow. This does not seem to be the case on VisualStudio 2008: to add 50 items to a std::vector took 177 calls of the copy constructor. Doubling each time should call the copy constructor 64 times. If you were very concerned about keeping memory usage low, then increasing by 50% each time should call the copy constructor 121 times. So where does the 177 come from?
My question is: (a) why is the copy constructor called so often? (b) is there any way to avoid using the copy constructor if you’re just moving an object from one location to another? (In this case and indeed most cases a memcpy() would have sufficed – and this makes a BIG difference).
(NB: I know about vector::reserve(), I’m just a bit disappointed that application programmers would need to implement the doubling trick when something like this is already part of any good STL implementation.)
My test program:
#include <string> #include <iostream> #include <vector> using namespace std; int constructorCalls; int assignmentCalls; int copyCalls; class C { int n; public: C(int _n) { n = _n; constructorCalls++; } C(const C& orig) { copyCalls++; n = orig.n; } void operator=(const C &orig) { assignmentCalls++; n = orig.n; } }; int main(int argc, char* argv[]) { std::vector<C> A; //A.reserve(50); for (int i=0; i < 50; i++) A.push_back(i); cout << 'constructor calls = ' << constructorCalls << '\n'; cout << 'assignment calls = ' << assignmentCalls << '\n'; cout << 'copy calls = ' << copyCalls << '\n'; return 0; }
The STL does tend to cause this sort of thing. The spec doesn’t allow memcpy’ing because that doesn’t work in all cases. There’s a document describing EASTL, a bunch of alterations made by EA to make it more suitable for their purposes, which does have a method of declaring that a type is safe to memcpy. Unfortunately it’s not open source AFAIK so we can’t play with it.
IIRC Dinkumware STL (the one in VS) grows vectors by 50% each time.
However, doing a series of push_back’s on a vector is a common inefficiency. You can either use reserve to alleviate it (at the cost of possibly wasting memory if you overestimate significantly) or use a different container – deque performs better for a series of insertions like that but is a little slower in random access, which may/may not be a good tradeoff for you.
Or you could look at storing pointers instead of values which will make the resizing much cheaper if you’re storing large elements. If you’re storing large objects this will always win because you don’t have to copy them ever – you’ll always save that one copy for each item on insertion at least.