i want to parse something like this in my lexer:
( begin expression )
where expressions are also surrounded by brackets. it isn’t important what is in the expression, i just want to have all what’s between the (begin and the matching ) as a token. an example would be:
(begin
(define x (+ 1 2)))
so the text of the token should be
(define x (+ 1 2)))
something like
PROGRAM : LPAREN BEGIN .* RPAREN;
does (obviously) not work because as soon as he sees a “)”, he thinks the rule is over, but i need the matching bracket for this.
how can i do that?
Inside lexer rules, you can invoke rules recursively. So, that’s one way to solve this. Another approach would be to keep track of the number of open- and close parenthesis and let a gated semantic predicate loop as long as your counter is more than zero.
A demo:
T.g
Main.java
Note that you’ll need to beware of string literals inside your source that might include parenthesis:
or comments that may contain parenthesis.
The suggestion with the predicate uses some language specific code (Java, in this case). An advantage of calling a lexer rule recursively is that you don’t have custom code in your lexer: