I’m having a weird problem with the following function, which returns a string with all the characters in it after a certain point:
string after(int after, string word) {
char temp[word.size() - after];
cout << word.size() - after << endl; //output here is as expected
for(int a = 0; a < (word.size() - after); a++) {
cout << word[a + after]; //and so is this
temp[a] = word[a + after];
cout << temp[a]; //and this
}
cout << endl << temp << endl; //but output here does not always match what I want
string returnString = temp;
return returnString;
}
The thing is, when the returned string is 7 chars or less, it works just as expected. When the returned string is 8 chars or more, then it starts spewing nonsense at the end of the expected output. For example, the lines
cout << after(1, "12345678") << endl;
cout << after(1, "123456789") << endl;
gives an output of:
7
22334455667788
2345678
2345678
8
2233445566778899
23456789�,�D~
23456789�,�D~
What can I do to fix this error, and are there any default C++ functions that can do this for me?
The behavior you’re describing would be expected if you copy the characters into the string but forget to tack a null character at the end to terminate the string. Try adding a null character to the end after the loop, and make sure you allocate enough space (one more character) for the null character. Or, better, use the
stringconstructor overload which accepts not just achar *but also a length.Or, even better std::string::substr — it will be easier and probably more efficient.
BTW, you don’t need an after method, since exactly what you want already exists on the
stringclass.Now, to answer your specific question about why this only showed up on the 8th and later characters, it’s important to understand how “C” strings work. A “C” string is a sequence of bytes which is terminated by a null (0) character. Library functions (like the string constructor you use to copy
tempinto astringinstance which takes achar *) will start reading from the first character (temp[0]) and will keep reading until the end, where “the end” is the first null character, not the size of the memory allocation. For example, iftempis 6 characters long but you fill up all 6 characters, then a library function reading that string to “the end” will read the first 6 characters and then keep going (past the end of the allocated memory!) until it finds a null character or the program crashes (e.g. due to trying to access an invalid memory location).Sometimes you may get lucky: if
tempwas 6 characters long and the first byte in memory after the end of your allocation happened to be a zero, then everything would work fine. If however the byte after the end of your allocation happened to be non-zero, then you’d see garbage characters. Although it’s not random (often the same bytes will be there every time since they’re filled by operations like previous method calls which are consistent from run to run of your program), but if you’re accessing uninitialized memory there’s no way of knowing what you’ll find there. In a bounds checking environment (e.g. Java or C# or C++’s string class), an attempt to read beyond the bounds of an allocation will throw an exception. But “C” strings don’t know where their end is, leaving them vulnerable to problems like the one you saw, or more nefarious problems like buffer overflows.Finally, a logical follow-up question you’d probably ask: why exactly 8 bytes? Since you’re trying to access memory that you didn’t allocate and didn’t initialize, whats in that RAM is what the previous user of that RAM left there. On 32-bit and 64-bit machines, memory is generally allocated in 4- or 8-byte chunks. So it’s likely that the previous user of that memory location stored 8 bytes of zeroes there (e.g. one 64-bit integer zero) zeros there. But the next location in memory had something different left there by the previous user. Hence your garbage characters.
Moral of the story: when using “C” strings, be very careful about your null terminators and buffer lengths!