In a C++ question about optimization and code style, several answers referred to “SSO” in the context of optimizing copies of std::string. What does SSO mean in that context?
Clearly not “single sign on”. “Shared string optimization”, perhaps?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Background / Overview
Operations on automatic variables (“from the stack”, which are variables that you create without calling
malloc/new) are generally much faster than those involving the free store (“the heap”, which are variables that are created usingnew). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system’s memory.SSO is the Short / Small String Optimization. A
std::stringtypically stores the string as a pointer to the free store (“the heap”), which gives similar performance characteristics as if you were to callnew char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations ofstd::stringcreate a small automatic array, something likechar [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to callnewat all, which speeds things up a bit.EDIT:
I wasn’t expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I’ve never actually read any implementation of SSO “in the wild”.
Implementation details
At the minimum, a
std::stringneeds to store the following information:The size could be stored as a
std::string::size_typeor as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user callssizeor add asize_typeto a pointer when the user callsend. The capacity can be stored either way as well.You don’t pay for what you don’t use.
First, consider the naive implementation based on what I outlined above:
For a 64-bit system, that generally means that
std::stringhas 24 bytes of ‘overhead’ per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn’t really make sense to store those three data members plus a local array of characters, as in my simplified example. Ifm_size <= 16, then I will put all of the data inm_sso, so I already know the capacity and I don’t need the pointer to the data. Ifm_size > 16, then I don’t needm_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):I’d assume that most implementations look more like this.