I’m trying to write a grammar for a language which allows the following expressions:
- Function calls of the form
f args(note: no parentheses!) - Addition (and more complex expressions but that’s not the point here) of the form
a + b
For example:
f 42 => f(42)
42 + b => (42 + b)
f 42 + b => f(42 + b)
The grammar is unambiguous (every expression can be parsed in exactly one way) but I don’t know how to write this grammar as a PEG since both productions potentially start with the same token, id. This is my wrong PEG. How can I rewrite it to make it valid?
expression ::= call / addition
call ::= id addition*
addition ::= unary
( ('+' unary)
/ ('-' unary) )*
unary ::= primary
/ '(' ( ('+' unary)
/ ('-' unary)
/ expression)
')'
primary ::= number / id
number ::= [1-9]+
id ::= [a-z]+
Now, when this grammar tries to parse the input “a + b” it parses “a” as a function call with zero arguments and chokes on “+ b”.
I’ve uploaded a C++ / Boost.Spirit.Qi implementation of the grammar in case anybody wants to play with it.
(Note that unary disambiguates unary operations and additions: In order to call a function with a negative number as an argument, you need to specify parentheses, e.g. f (-1).)
As proposed in chat you could start out with something like:
Since then I implemented this in C++ with an AST presentation, so you can get a feel for how this grammar actually build the expression tree by pretty printing it.
Grammar:
The corresponding AST structures are defined quick-and-dirty using the very powerful Boost Variant:
In the full code, I’ve also overloaded operator<< for these structures.
Full Demo
Alternative:
I have an alternative version that build
addition_titeratively instead of recursively, so to say:This removes the need to use Phoenix to build the expression: