I’m writing a parser/interpreter for a C-like language and I need to interpret escaped

Question

0

Asked: May 26, 20262026-05-26T01:55:53+00:00 2026-05-26T01:55:53+00:00

I’m writing a parser/interpreter for a C-like language and I need to interpret escaped

0

I’m writing a parser/interpreter for a C-like language and I need to interpret escaped characters. One of them is the unicode-escaped sequence with this pattern “\uXXXX” where X is some hex number.

My ANTLR rules look like this:

public char returns [char c] 
    : '\\"' { $c = '"'; } 
    | '\\\\' { $c = '\\'; }
    | '\\/' { $c = '/'; }
    | '\\b' { $c = '\b'; }
    | '\\f' { $c = '\f'; }
    | '\\n' { $c = '\n'; }
    | '\\r' { $c = '\r'; }
    | '\\t' { $c = '\t'; }
    | '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT { $c = 'e'; }
    | ~('\\' | '"') { $c = '/'; }
    ;

fragment HEXDIGIT
    : ('0'..'9'|'a'..'f'|'A'..'F')

I’m feeding it this string “\u1234” for which I expect an ‘e’ but I’m getting a ‘/’ instead which is the fallback rule for everything else.

Is there some magic juju going on with fragments and rules or something that I’m not aware of?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T01:55:53+00:00

As mentioned by Adam, char is a parser rule at the moment, but should be made a lexer rule instead, in which case you can’t let it return a char (lexer rules always return an instance of a Token!).

You can adjust the inner-text of a token using its setText(...) method like this (assuming Java is the target language):

// lexer rules start with a capital!
Char
  :  '\\"'                                     { setText("\""); } 
  |  '\\\\'                                    { setText("\\"); } 
  |  '\\/'                                     { setText("/"); } 
  |  '\\b'                                     { setText("\b"); } 
  |  '\\f'                                     { setText("\f"); } 
  |  '\\n'                                     { setText("\n"); } 
  |  '\\r'                                     { setText("\r"); } 
  |  '\\t'                                     { setText("\t"); } 
  |  '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT 
     { 
       String hex = getText();
       int i = Integer.parseInt(hex.substring(2), 16);
       setText(hex + " base 10 = " + i);
     } 
  |  ~('\\' | '"')
  ;

fragment HEXDIGIT
  :  ('0'..'9'|'a'..'f'|'A'..'F')
  ;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a parser/interpreter for a C-like language and I need to interpret escaped

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply