I’m new to ANTLR, and I’m attempting to write a simple parser using C

Question

0

Asked: May 17, 20262026-05-17T20:08:39+00:00 2026-05-17T20:08:39+00:00

I’m new to ANTLR, and I’m attempting to write a simple parser using C

0

I’m new to ANTLR, and I’m attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I’d like to have each rule return a value, eg:

number returns [long value]
 :
 ( INT {$value = $INT.ivalue;}
 | HEX {$value = $HEX.hvalue;}
 ) 
 ; 

HEX returns [long hvalue] 
    : '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+  {$hvalue = strtol((char*)$text->chars,NULL,16);}
    ;

INT returns [long ivalue] 
    : '0'..'9'+    {$ivalue = strtol((char*)$text->chars,NULL,10);}
    ;

Each rule collects the return value of it’s child rules until the topmost rule returns a nice struct full of my data.

As far as I can tell, ANTLR allows lexer rules (tokens, eg ‘INT’ & ‘HEX’) to return values just like parser rules (eg ‘number’). However, the generated C code will not compile:

error C2228: left of '.ivalue' must have class/struct/union
error C2228: left of '.hvalue' must have class/struct/union

I did some poking around, and the errors make sense – the tokens end up as generic ANTLR3_COMMON_TOKEN_struct, which doesn’t allow for a return value. So maybe the C target just doesn’t support this feature. But like I said, I’m new to this, and before I go haring off to find another approach I want to confirm that I can’t do it this way.

So the question is this: ‘Does antler3C support return values for lexer rules, and if so what is the proper way to use them?’

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T20:08:39+00:00

Not really any new information, just some details on what @bemace already mentioned.

No, lexer rules cannot have return values. See 4.3 Rules from The Definitive ANTLR reference:

Rule Arguments and Return Values

Just like function calls, ANTLR parser and tree parser rules can have
arguments and return values. ANTLR lexer rules cannot have return
values […]

There are two options:

Option 1

You can do the transforming to a long in the parser rule number:

number returns [long value]
  :  INT {$value = Long.parseLong($INT.text);}
  |  HEX {$value = Long.parseLong($HEX.text.substring(2), 16);}
  ;

Option 2

Or create your own token that has, say, a toLong(): long method:

import org.antlr.runtime.*;

public class YourToken extends CommonToken {

  public YourToken(CharStream input, int type, int channel, int start, int stop) {
    super(input, type, channel, start, stop);
  }

  // your custom method
  public long toLong() {
    String text = super.getText();
    int radix = text.startsWith("0x") ? 16 : 10;
    if(radix == 16) text = text.substring(2);
    return Long.parseLong(text, radix);
  }
}

and define in the options {...} header in your grammar to use this token and override the emit(): Token method in your lexer class:

grammar Foo;

options{
  TokenLabelType=YourToken;
}

@lexer::members {
  public Token emit() {
    YourToken t = new YourToken(input, state.type, state.channel, 
        state.tokenStartCharIndex, getCharIndex()-1);
    t.setLine(state.tokenStartLine);
    t.setText(state.text);
    t.setCharPositionInLine(state.tokenStartCharPositionInLine);
    emit(t);
    return t;
  }
}

parse
  :  number {System.out.println("parsed: "+$number.value);} EOF
  ;

number returns [long value]
  :  INT {$value = $INT.toLong();}
  |  HEX {$value = $HEX.toLong();}
  ;

HEX
  :  '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+
  ;

INT
  :  '0'..'9'+
  ;

When you generate a parser and lexer, and run this test class:

import org.antlr.runtime.*;
import java.io.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("0xCafE");
        FooLexer lexer = new FooLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        FooParser parser = new FooParser(tokens);
        parser.parse();
    }
}

it will produce the following output:

parsed: 51966

The first options seems the more practical in your case.

Note that, as you can see, the examples given are in Java. I have no idea if option 2 is supported in the C target/runtime. I decided to still post it to be able to use it as a future reference here on SO.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m new to ANTLR, and I’m attempting to write a simple parser using C

Leave an answerCancel reply

1 Answer

Rule Arguments and Return Values

Option 1

Option 2

Leave an answer
Cancel reply