How to use lexer rules having same starting? I am trying to use two

Question

0

Editorial Team

Asked: June 1, 20262026-06-01T12:43:29+00:00 2026-06-01T12:43:29+00:00

How to use lexer rules having same starting? I am trying to use two

0

How to use lexer rules having same starting?

I am trying to use two similar lexer rules (having the same starting):

TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+;
INTEGER     : ('0'..'9')+;
COLON       : ':';

Here is my sample grammar:

grammar TestTime;

text      : (timeexpr | caseblock)*;

timeexpr  : TIME;
caseblock : INT COLON ID;

TIME      : ('0'..'9')+ ':' ('0'..'9')+;
INT       : ('0'..'9')+;
COLON     : ':';
ID        : ('a'..'z')+;

WS        : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

When i try to parse text:

12:44
123 : abc
123: abc

First two lines are parsed correctly, 3rd – generates error.
For some reason, ‘123:’ ANTLR parses as TIME (while it is not)…

So, is it possible to make grammar with such lexems?

Having such rules is necessary in my language for using both case-blocks and datetime constants. For example in my language it is possible to write:

case MyInt of
  1: a := 01.01.2012;
  2: b := 12:44;
  3: ....
end;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T12:43:30+00:00

As soon DIGIT+ ':' is matched, the lexer expects this to be followed by another DIGIT to match a TIMECONSTANT. If this does not happen, it cannot fall back on another lexer rule that matches DIGIT+ ':' and the lexer will not give up on the already matched ':' to match an INTEGER.

A possible solution would be to optionally match ':' DIGIT+ at the end of the INTEGER rule and change the type of the token if this gets matched:

grammar T;  

parse
 : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;

INTEGER      : DIGIT+ ((':' DIGIT)=> ':' DIGIT+ {$type=TIMECONSTANT;})?;
COLON        : ':';
SPACE        : ' ' {skip();};

fragment DIGIT : '0'..'9';
fragment TIMECONSTANT : ;

When parsing the input:

11: 12:13 : 14

the following will be printed:

INTEGER         '11'
COLON           ':'
TIMECONSTANT    '12:13'
COLON           ':'
INTEGER         '14'

EDIT

Not too nice, but works…

True. However, this is not an ANTLR short coming: most lexer generators I know will have a problem properly tokenizing such a TIMECONSTANT (when INTEGER and COLON are also present). ANTLR at least facilitates a way to handle it in the lexer 🙂

You could also let this be handled by the parser instead of the lexer:

time_const : INTEGER COLON INTEGER;
INTEGER    : '0'..'9'+;
COLON      : ':';
SPACE      : ' ' {skip();};

However, if your language’s lexer ignores white spaces, then input like:

12 :    34

would also be match by the time_const rule, of course.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How to use lexer rules having same starting? I am trying to use two

Leave an answerCancel reply

1 Answer

EDIT

Leave an answer
Cancel reply