Specifically, I am trying to implement a RegExp parser in ANTLR. Here are the

Question

0

Editorial Team

Asked: June 7, 20262026-06-07T20:10:24+00:00 2026-06-07T20:10:24+00:00

Specifically, I am trying to implement a RegExp parser in ANTLR. Here are the

0

Specifically, I am trying to implement a RegExp parser in ANTLR.

Here are the relevant parts of my grammar:

grammar JavaScriptRegExp;
options {
    language = 'CSharp3';
}

tokens {
    /* snip */
    QUESTION = '?';
    STAR = '*';
    PLUS = '+';
    L_CURLY = '{';
    R_CURLY = '}';
    COMMA = ',';
}

/* snip */

quantifier returns [Quantifier value]
    :   q=quantifierPrefix QUESTION?
        {
            var quant = $q.value;
            quant.Eager = $QUESTION == null;
            return quant;
        }
    ;

quantifierPrefix returns [Quantifier value]
    :   STAR { return new Quantifier { Min = 0 }; }
    |   PLUS { return new Quantifier { Min = 1 }; }
    |   QUESTION { return new Quantifier { Min = 0, Max = 1 }; }
    |   L_CURLY min=DEC_DIGITS (COMMA max=DEC_DIGITS?)? R_CURLY
        {
            var minValue = int.Parse($min.Text);
            if ($COMMA == null)
            {
                return new Quantifier { Min = minValue, Max = minValue };
            }
            else if ($max == null)
            {
                return new Quantifier { Min = minValue, Max = null };
            }
            else
            {
                var maxValue = int.Parse($max.Text);
                return new Quantifier { Min = minValue, Max = maxValue };
            }
        }
    ;

DEC_DIGITS
    :   ('0'..'9')+
    ;

/* snip */

CHAR
    :   ~('^' | '$' | '\\' | '.' | '*' | '+' | '?' | '(' | ')' | '[' | ']' | '{' | '}' | '|')
    ;

Now, INSIDE of the curly braces, I would like to tokenize ‘,’ as COMMA, but OUTSIDE, I would like to tokenize it as CHAR.

Is this possible?

This is not the only case where this is happening. I will have many other instances where this is a problem (decimal digits, hyphens in character classes, etc.)

EDIT:

I know realize that this is called context-sensitive lexing. Is this possible with ANTLR?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T20:10:25+00:00

It is possible to do this using gated semantic predicates in the lexer. In the code below ‘,’ will match the COMMA rule only if the isComma is true. Otherwise it will match CHAR provided CHAR appears after COMMA in the grammar. I don’t know CSharp so I can’t give a complete example.

L_CURLY : '{' {setComma();};
R_CURLY : '}' {clearComma();};
COMMA : {isComma}? => ',';

Obviously if curly braces are used in different contexts, this may not work. I recommend avoiding using the lexer this way unless it really makes a mess of the parser.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Specifically, I am trying to implement a RegExp parser in ANTLR. Here are the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply