I would like to know what’s the performance overhead of
string line, word;
while (std::getline(cin, line))
{
istringstream istream(line);
while (istream >> word)
// parse word here
}
I think this is the standard c++ way to tokenize input.
To be specific:
- Does each line copied three times, first via
getline, then viaistreamconstructor, last viaoperator>>for each word? - Would frequent construction & destruction of
istreambe an issue? What’s the equivalent implementation if I defineistreambefore the outerwhileloop?
Thanks!
Update:
An equivalent implementation
string line, word;
stringstream stream;
while (std::getline(cin, line))
{
stream.clear();
stream << line;
while (stream >> word)
// parse word here
}
uses a stream as a local stack, that pushes lines, and pops out words.
This would get rid of possible frequent constructor & destructor call in the previous version, and utilize stream internal buffering effect (Is this point correct?).
Alternative solutions, might be extends std::string to support operator<< and operator>>, or extends iostream to support sth. like locate_new_line. Just brainstorming here.
Unfortunately, iostreams is not for performance-intensive work. The problem is not copying things in memory (copying strings is fast), it’s virtual function dispatches, potentially to the tune of several indirect function calls per character.
As for your question about copying, yes, as written everything gets copied when you initialize a new
stringstream. (Characters also get copied from the stream to the output string bygetlineor>>, but that obviously can’t be prevented.)Using C++11’s
movefacility, you can eliminate the extraneous copies:All that said, performance is only an issue if a measurement tool tells you it is. Iostreams is flexible and robust, and
filebufis basically fast enough, so you can prototype the code so it works and then optimize the bottlenecks without rewriting everything.