consider following rules in the parser:
expression
: IDENTIFIER
| (...)
| procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
procedure_call
: '(' expression expression* ')'
;
macro_use
: '(' IDENTIFIER datum* ')'
;
and
// Note that any string that parses as an <expression> will also parse as a <datum>.
datum
: simple_datum
| compound_datum
;
simple_datum
: BOOLEAN
| NUMBER
| CHARACTER
| STRING
| IDENTIFIER
;
compound_datum
: list
| vector
;
list
: '(' (datum+ ( '.' datum)?)? ')'
| ABBREV_PREFIX datum
;
fragment ABBREV_PREFIX
: ('\'' | '`' | ',' | ',@')
;
vector
: '#(' datum* ')'
;
the procedure_call and macro_rule alternative in the expression rule generate an non-LL(*) structure error. I can see the problem, since (IDENTIFIER) will parse as both. but even when i define both with + instead of *, it generates the error, even though above example shouldn’t be parsing anymore.
i came up with the usage of syntactic predicates, but i can’t figure out how to use them to do the trick here.
something like
expression
: IDENTIFIER
| (...)
| (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
or
expression
: IDENTIFIER
| (...)
| ('(' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
| macro_use // e.g. (xyz (some datum))
;
doesnt work either, since none but the first rule will match anything. is there a proper way to solve that?
I found a JavaCC grammar of R5RS which I used to (quickly!) write an ANTLR equivalent:
which can be tested with the following class:
and to generate a lexer & parser, compile all Java source files and run the main class, do:
The fact that nothing is being printed on the console means the parser (and lexer) didn’t find any errors with the provided source.
Note that I have no Unit tests and have only tested the single Scheme source inside the
Mainclass. If you find errors in the ANTLR grammar, I’d appreciate to hear about them so I can fix the grammar. In due time, I’ll probably commit the grammar to the official ANTLR Wiki.