I’m writing a DSL and using a Boost Spirit lexer to tokenize my input. In my grammar, I want a rule similar to this (where tok is the lexer):
header_block =
tok.name >> ':' >> tok.stringval > ';' >>
tok.description >> ':' >> tok.stringval > ';'
;
Rather than specifying reserved words for the language (e.g. “name”, “description”) and deal with synchronizing these between the lexer and grammar, I want to just tokenize everything that matches [a-zA-Z_]\w* as a single token type (e.g. tok.symbol), and let the grammar sort it out. If I weren’t using a lexer, I might do something like this:
stringval = lexeme['"' >> *(char_ - '"') >> '"'];
header_block =
lit("name") >> ':' >> stringval > ';' >>
lit("description") >> ':' >> stringval > ';'
;
With a lexer in the mix, I can compile the following rule, but of course it matches more than I want — it doesn’t care about the particular symbol values “name” and “description”:
header_block =
tok.symbol >> ':' >> tok.stringval > ';' >>
tok.symbol >> ':' >> tok.stringval > ';'
;
What I’m looking for is something like this:
header_block =
specific_symbol_matcher("name") >> ':' >> tok.stringval > ';' >>
specific_symbol_matcher("description") >> ':' >> tok.stringval > ';'
;
Does Qi provide anything I can use instead of my specific_symbol_matcher hand-waving, there? I’d rather not write my own matcher if I can get close using stuff that’s provided. If I must write my own matcher, can anyone suggest how to do that?
If the token exposes a std::string, you should just be able to do:
If I understood you right, this is, more or less, what you were asking.
While you are at it, do look at
qi::symbol<>and an especially nifty application of that, known as the Nabialek Trick.Bonus material
In case you’re just struggling to make an existing grammar work with a lexer, here’s what I just did with the
calc_utree_ast.cppexample to make it work with a lexer.It shows
For the input
It prints (without debug info)