I am currently working on an application for a low-memory platform that requires an std::set of many short strings (>100,000 strings of 4-16 characters each). I recently transitioned this set from std::string to const char * to save memory and I was wondering whether I was really avoiding all that much overhead per string.
I tried using the following:
std::string sizeTest = "testString";
std::cout << sizeof(sizeTest) << " bytes";
But it just gave me an output of 4 bytes, indicating that the string contains a pointer. I’m well aware that strings store their data in a char * internally, but I thought the string class would have additional overhead.
Does the GCC implementation of std::string incur more overhead than sizeof(std::string) would indicate? More importantly, is it significant over this size of data set?
Here are the sizes of relevant types on my platform (it is 32-bit and has 8 bits per byte):
char: 1 bytes
void *: 4 bytes
char *: 4 bytes
std::string: 4 bytes
Well, at least with GCC 4.4.5, which is what I have handy on this
machine,
std::stringis a typdef forstd::basic_string<char>, andbasic_stringis defined in/usr/include/c++/4.4.5/bits/basic_string.h. There’s a lot ofindirection in that file, but what it comes down to is that nonempty
std::strings store a pointer to one of these:Followed in-memory by the actual string data. So
std::stringisgoing to have at least three words of overhead for each string, plus
any overhead for having a higher
capacitythan`length(probablynot, depending on how you construct your strings — you can check by
asking the
capacity()method).There’s also going to be overhead from your memory allocator for doing
lots of small allocations; I don’t know what GCC uses for C++, but
assuming it’s similar to the
dlmallocallocator it uses for C, thatcould be at least two words per allocation, plus some space to align
the size to a multiple of at least 8 bytes.