For my very own little parser framework, I am trying to define (something like) the following function:
template <class T>
// with operator>>( std::istream&, T& )
void tryParse( std::istream& is, T& tgt )
{
is >> tgt /* , *BUT* store every character that is consumed by this operation
in some string. If afterwards, is.fail() (which should indicate a parsing
error for now), put all the characters read back into the 'is' stream so that
we can try a different parser. */
}
Then I could write something like this: (maybe not the best example)
/* grammar: MyData = <IntTriple> | <DoublePair>
DoublePair = <double> <double>
IntTriple = <int> <int> <int> */
class MyData
{ public:
union { DoublePair dp; IntTriple it; } data;
bool isDoublePair;
};
istream& operator>>( istream& is, MyData& md )
{
/* If I used just "is >> md.data.it" here instead, the
operator>>( ..., IntTriple ) might consume two ints, then hit an
unexpected character, and fail, making it impossible to read these two
numbers as doubles in the "else" branch below. */
tryParse( is, md.data.it );
if ( !is.fail() )
md.isDoublePair = false;
else
{
md.isDoublePair = true;
is.clear();
is >> md.data.dp;
}
return is;
}
Any help is greatly appreciated.
Unfortunately, streams have only very minimal and rudimentary putback support.
The last times I needed this, I wrote my own reader classes which wrapped a stream, but had a buffer to put things back into, and read from the stream only when that buffer is empty. These had ways to get a state from, and you could commit a state or rollback to an earlier state.
The default action in the state class’ destructor was to rollback, so that you could parse ahead without giving much thought to error handling, because an exception would simply rollback the parser’s state up to a point where a different grammar rule was tried. (I think this is called backtracking.) Here’s a sketch:
This should give you an idea. It has none of the implementation, but that was straightforward and should be easy to redo. Also, the real code had many convenient functions like reading functions that read a delimited string, consumed a string if it was one of several given keywords, read a string and converted it to a type given per template parameter, and stuff like this.
The idea was that a function would set the error index to its starting position, save the parse state, and try to parse until it either succeeded or ran into a dead end. In the latter case, it would just throw an exception. This would destroy the
parse_stateobjects on the stack, rolling back the state up to a function which could catch the exception and either try something else, or output an error (which is whereget_error_string()comes in.)If you want a really fast parser, this strategy might be wrong, but then streams are often to slow, too. OTOH, the last time I used something like this, I made an XPath parser that operates on a proprietary DOM, which is used to represent scenes in a 3D renderer. And it was not the XPath parser that got all the heat from the guys trying to get higher frame rates.
:)