I have an input string I’m trying to parse. It might look like either of the two:
sys(error1, 2.3%)
sys(error2 , 2.4%)
sys(this error , 3%)
Note the space sometimes before the comma. In my grammer (boost spirit library) I’d like to capture “error1”, “error2”, and “this error” respectively.
Here is the original grammar I had to capture this – which absorbed the space at the end of the name:
name_string %= lexeme[+(char_ - ',' - '"')];
name_string.name("Systematic Error Name");
start = (lit("sys")|lit("usys")) > '('
> name_string[boost::phoenix::bind(&ErrorValue::SetName, _val, _1)] > ','
> errParser[boost::phoenix::bind(&ErrorValue::CopyErrorAndRelative, _val, _1)]
> ')';
My attempt to fix this was first:
name_string %= lexeme[*(char_ - ',' - '"') > (char_ - ',' - '"' - ' ')];
however that completely failed. Looks like it failes to parse anything with a space in the middle.
I’m fairly new with Spirit – so perhaps I’m missing something simple. Looks like lexeme turns off skipping on the leading edge – I need something that does it on the leading and trailing edge.
Thanks in advance for any help!
Thanks to psur below, I was able to put together an answer. It isn’t perfect (see below), but I thought I would update the post for everyone to see it in context and nicely formatted:
qi::rule<Iterator, std::string(), ascii::space_type> name_word;
qi::rule<Iterator, std::string(), ascii::space_type> name_string;
ErrorValueParser<Iterator> errParser;
name_word %= +(qi::char_("_a-zA-Z0-9+"));
//name_string %= lexeme[name_word >> *(qi::hold[+(qi::char_(' ')) >> name_word])];
name_string %= lexeme[+(qi::char_("-_a-zA-Z0-9+")) >> *(qi::hold[+(qi::char_(' ')) >> +(qi::char_("-_a-zA-Z0-9+"))])];
start = (
lit("sys")[bind(&ErrorValue::MakeCorrelated, _val)]
|lit("usys")[bind(&ErrorValue::MakeUncorrelated, _val)]
)
>> '('
>> name_string[bind(&ErrorValue::SetName, _val, _1)] >> *qi::lit(' ')
>> ','
>> errParser[bind(&ErrorValue::CopyErrorAndRelative, _val, _1)]
>> ')';
This works! They key to this is the name_string, and in it the qi::hold, a operator I was not familiar with before this. It is almost like a sub-rule: everything inside qi::hold[…] must successfully parse for it to go. So, above, it will only allow a space after a word if there is another word following. The result is that if a sequence of words end in a space(s), those last spaces will not be parsed! They can be absorbed by the *qi::lit(‘ ‘) that follows (see the start rule).
There are two things I’d like to figure out how to improve here:
-
It would be nice to put the actual string parsing into name_word. The problem is the declaration of name_word – it fails when it is put in the appropriate spot in the definition of name_string.
-
It would be even better if name_string could include the parsing of the trailing spaces, though its return value did not. I think I know how to do that…
When/if I figure these out I will update this post. Thanks for the help!
Below rules should work for you:
name_wordparse only one word in name; I assumed that it contains only letter, digits and underscore.In
startruleqi::holdis important. It will parse space only if next isname_word. In other case parser will rollback and move to*qi::lit(' ')and then to comma.