I have a lexical rule (Integer) which uses some fragments. In a parser rule

Question

0

Asked: June 8, 20262026-06-08T02:52:13+00:00 2026-06-08T02:52:13+00:00

I have a lexical rule (Integer) which uses some fragments. In a parser rule

0

I have a lexical rule (Integer) which uses some fragments. In a parser rule (parse) I want to rewrite my tree differently depending on which fragment generated the token in question. I have made a small grammar to demonstrate what I’m attempting:

grammar subrange;

options {
    output=AST;
}

tokens {
    NumberNode;
    DecimalNode;
    BinaryNode;
    HexNode;
    OctalNode;
}

parse
    : Integer+ -> ^(NumberNode Integer)+
    ;

Integer
    : DECIMAL_LITERAL
    | BINARY_LITERAL
    | HEX_LITERAL
    | OCTAL_LITERAL
    ;

fragment BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

fragment HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

fragment HEX_DIGIT
    : (DIGIT|'a'..'f'|'A'..'F')
    ;

fragment DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

fragment OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

fragment DIGIT
    : '0'..'9'
    ;

SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};

I want the parse rule to rewrite a DECIMAL_LITERAL under an imaginary DecimalNode but a BINARY_LITERAL under a BinaryNode (rather than everything under a NumberNode).

I’m attempting to do this by changing the token type inside the lexical rule so that I can then rewrite accordingly inside the parse rule.

I think I should be able to do this with an action but I have been unable to figure out how to find the returned token in order to change its type. http://www.antlr.org/wiki/display/ANTLR3/Special+symbols+in+actions seems to indicate that $tokenref should work but it doesn’t get translated at all.

Or is there another way to accomplish this?

Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T02:52:14+00:00

It seems a bit odd to me: grouping all such literals under a single Integer token, and then, in a parser rule you want to separate them again.

Why not just remove Integer and do:

integer
    : BINARY_LITERAL // when output=AST, this creates a CommonTree with type 'BINARY_LITERAL'
    | HEX_LITERAL    // ...
    | DECIMAL_LITERAL
    | OCTAL_LITERAL 
    ;

BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

?

Or you could keep the Int(eger) rule but set the numerical value of the various int-literals by doing:

Int
@init{int skip = 0, base = 10;}
    : ( DECIMAL_LITERAL
      | BINARY_LITERAL  {base = 2;  skip = 2;} 
      | OCTAL_LITERAL   {base = 8;  skip = 2;} 
      | HEX_LITERAL     {base = 16; skip = $text.contains("#") ? 3 : 2;} 
      )
      {
        setText(String.valueOf(Integer.parseInt($text.substring(skip), base)));
      }
    ;

fragment BINARY_LITERAL
    : '2#' ('0' | '1')+
    ;

fragment HEX_LITERAL 
    : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
    ;

fragment DECIMAL_LITERAL 
    : ('0' | '1'..'9' DIGIT*)
    ;

fragment OCTAL_LITERAL 
    : '8#' ('0'..'7')+
    ;

Be careful giving rules a name as some object/class/reserved-word of the target language can have (Integer in case of Java).

EDIT

Okay. I’ll leave my other answer there in case passers-by are wondering why on earth I’m proposing this… 🙂

Here’s what (I think) you’re after:

grammar T;

options {
  output=AST;
}

tokens {
  BinaryNode;
  OctalNode;
  HexNode;
  DecimalNode;
}

parse
 : integer+
 ;

integer
 : i=Integer -> {$i.text.startsWith("2#")}?         ^(BinaryNode Integer)
             -> {$i.text.startsWith("8#")}?         ^(OctalNode Integer)
             -> {$i.text.matches("(16#|0[xX]).*")}? ^(HexNode Integer)
             ->                                     ^(DecimalNode Integer)
 ;

Integer
 : DECIMAL_LITERAL
 | BINARY_LITERAL
 | HEX_LITERAL
 | OCTAL_LITERAL
 ;

fragment BINARY_LITERAL
 : '2#' ('0' | '1')+
 ;

fragment HEX_LITERAL 
 : ('16#' | '0' ('x'|'X')) HEX_DIGIT+
 ;

fragment HEX_DIGIT
 : (DIGIT|'a'..'f'|'A'..'F')
 ;

fragment DECIMAL_LITERAL 
 : ('0' | '1'..'9' DIGIT*)
 ;

fragment OCTAL_LITERAL 
 : '8#' ('0'..'7')+
 ;

fragment DIGIT
 : '0'..'9'
 ;

SPACE 
 : (' ' | '\t' | '\r' | '\n')+ {skip();}
 ;

Parsing the input "2#1111 8#77 0xff 16#ff 123" will result in the following AST:

enter image description here

Since you’ve lost the information about what type of Integer each literal is, you will have to do this check in the integer-rule (the -> {boolean-expression}? ... things after the rewrite rules).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a lexical rule (Integer) which uses some fragments. In a parser rule

Leave an answerCancel reply

1 Answer

EDIT

Leave an answer
Cancel reply