I have a class which represents a character sequence and I’d like to implement an operator >> for it. My implementation currently looks like this:
inline std::istream& operator >>(std::istream& in, seq& rhs) {
std::copy(
std::istream_iterator<char>(in),
std::istream_iterator<char>(),
std::back_inserter(rhs));
// `copy` doesn't know when to stop reading so it always also sets `fail`
// along with `eof`, even if reading succeeded. On the other hand, when
// reading actually failed, `eof` is not going to be set.
if (in.fail() and in.eof())
in.clear(std::ios_base::eofbit);
return in;
}
However, the following predictably fails:
std::istringstream istr("GATTACA FOO");
seq s;
assert((istr >> s) and s == "GATTACA");
In particular, once we reach the space in “GATTACA FOO”, the copying stop (expected) and sets the failbit on the istream (also expected). However, the read operation actually succeeded as far as seq is concerned.
Can I model this at all using std::copy? I also thought of using an istreambuf_iterator instead but this doesn’t actually solve this particular problem.
What’s more, a read operation on the input “GATTACAFOO” should fail since that input doesn’t represent a valid DNA sequence (which is what my class represents). On the other hand, reading an int from the input 42foo actually succeeds in C++ so maybe I should consider every valid prefix as a valid input?
(Incidentally, this would be fairly straightforward with an explicit loop but I’m trying to avoid explicit loops in favour of algorithms.)
You don’t want to
clear(eofbit)because thefailbitshould stay set if reading failed due to reaching EOF. Otherwise if you just leaveeofbitset withoutfailbitthen a loop such aswhile (in >> s)will attempt another read after reaching EOF, and then that read will setfailbitagain. Except if it was using youroperator>>it would clear it, and try to read again. And again. And again. The right behaviour for a stream is to setfailbitif reading failed because of EOF, so just leave it set.To do this with iterators and an algorithm you’d need something like
which would copy the input sequence only while the predicate was true, but that doesn’t exist in the standard library. You could certainly write one though.
Now you could use that like this:
This works, but the problem is that
istream_iteratorusesoperator>>to read characters, so it skips over whitespace. This means the space following"GATTACA"is consumed by the algorithm and discarded, so adding this to the end ofmainwould fail:To solve this use
istreambuf_iteratorwhich doesn’t skip whitespace:To complete this, you probably want to indicate failure to extract a
seqif no characters where extracted:That final version also uses one of my favourite C++11 tricks to simpify it slightly, by using
{}for the end iterator. The type of the second argument tocopy_whilemust be the same as the type of the first argument, which is deduced asstd::istreambuf_iterator<char>, so the{}simply value-initializes another iterator of that same type.Edit: If you want a closer match to
std::stringextraction then you can do so too:The sentry will skip leading whitespace and if you reach the end of the input it will set
eofbit. The other change that should probably be made is to empty theseqbefore pushing anything into it, e.g. start withrhs.clear()or equivalent for yourseqtype.