I’m trying to code a context-sensitive lexer rule using ANTLR but can’t get it to do what I need. The rule needs to match 1 of 2 alternatives based on characters found in the beginning of the rule. Below is greatly simplified version of the problem.
This example grammar:
lexer grammar X;
options
{
language = C;
}
RULE :
SimpleIdent {ctx->someFunction($SimpleIdent);}
(
{ctx->test != true}?
//Nothing
| {ctx->test == true}?
SLSpace+ OtherText
)
;
fragment SimpleIdent : ('a'..'z' | 'A'..'Z' | '_')+;
fragment SLSpace : ' ';
fragment OtherText : (~'\n')* '\n';
I would expect the lexer to exit this rule if ctx->test is false, ignoring any characters after SimpleIdent. Unfortunately ANTLR will test the character after SimpleIdent before the predicate is tested and thus will always take the second alternative if there is a space there. This is clearly shown in the C code:
// X.g:10:3: ({...}?|{...}? ( SLSpace )+ OtherText )
{
int alt2=2;
switch ( LA(1) )
{
case '\t':
case ' ':
{
alt2=2;
}
break;
default:
alt2=1;
}
switch (alt2)
{
case 1:
// X.g:11:5: {...}?
{
if ( !((ctx->test != true)) )
{
//Exception
}
}
break;
case 2:
// X.g:13:5: {...}? ( SLSpace )+ OtherText
{
if ( !((ctx->test == true)) )
{
//Exception
}
How can I force ANTLR to take a specific path in the lexer at runtime?
Use a gated semantic predicate instead of a validating semantic predicate 1. A validating predicate throws an exception if the expression validates to
false. And let the “Nothing alternative” be the last to match.Also,
OtherTextalso matches whatSLSpace, makingSLSpace+ OtherTextambiguous. Simply removeSLSpace+from it, or letOtherTextstart with something other than a' '.I’m not that familiar with the C target, but this Java demo should work just fine for C (after translating the Java code, of course):
If you’d now parse the input:
you’ll get the following parse:
I.e. whenever a
RULEstarts with a lower case"a", it doesn’t match all the way to the end of the line.1 What is a 'semantic predicate' in ANTLR?