I have a question concerning the difference between a “normal” C++ string and a string of unsigned characters.
When generating some pseudorandom strings of chars and of unsigned chars, I noticed a huge performance difference between the time the code would need to build a normal string and one composed of unsigned chars.
The code I used:
#include <tr1/random>
#include <string>
using namespace std;
using namespace tr1;
typedef basic_string<unsigned char > ustring;
string generateString(){
string retStr;
char a;
for(unsigned int i = 0; i < 1000; i++){
a = rand();
retStr+=a;
}
return retStr;
}
ustring generateUString(){
ustring retStr;
unsigned char a;
for(unsigned int i = 0; i < 1000; i++){
a = rand();
retStr+=a;
}
return retStr;
}
int main(int argc, char* args[]){
srand(0);
string thing;
ustring uthing;
for(unsigned int i = 1; i< 100000; i++){
//thing = generateString(); // this needs 2 second to execute
uthing = generateUString(); // and this 13
}
return 0;
}
So basically, the code needs 2 seconds to execute generateString() 100 000 times, while it needs 13 seconds to execute generateUString() 100 000 times.
What exactly is the reason for this? I guess it’s the += operator, since the difference melted away when I cut the corresponding lines (actually, generateUstring() seems to be faster then, I guess because the modulo arithmetic is easier in that case).
But why then is it so much faster to append a char to a string than to append a unsigned char to a string of unsigned chars? And should I hence avoid strings of unsigned chars?
The reason is probably that
basic_string<char>is explicitly instantiated inlibstdc++.sowhich (by default) is compiled with-O2. So if you don’t compile your program with optimization thebasic_string<unsigned char>operations will be un-optimized, but all thebasic_string<char>operations that aren’t inlined will use the optimized code inlibstdc++.so.