I’m new to ANTLR, and I’m attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I’d like to have each rule return a value, eg:
number returns [long value]
:
( INT {$value = $INT.ivalue;}
| HEX {$value = $HEX.hvalue;}
)
;
HEX returns [long hvalue]
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+ {$hvalue = strtol((char*)$text->chars,NULL,16);}
;
INT returns [long ivalue]
: '0'..'9'+ {$ivalue = strtol((char*)$text->chars,NULL,10);}
;
Each rule collects the return value of it’s child rules until the topmost rule returns a nice struct full of my data.
As far as I can tell, ANTLR allows lexer rules (tokens, eg ‘INT’ & ‘HEX’) to return values just like parser rules (eg ‘number’). However, the generated C code will not compile:
error C2228: left of '.ivalue' must have class/struct/union
error C2228: left of '.hvalue' must have class/struct/union
I did some poking around, and the errors make sense – the tokens end up as generic ANTLR3_COMMON_TOKEN_struct, which doesn’t allow for a return value. So maybe the C target just doesn’t support this feature. But like I said, I’m new to this, and before I go haring off to find another approach I want to confirm that I can’t do it this way.
So the question is this: ‘Does antler3C support return values for lexer rules, and if so what is the proper way to use them?’
Not really any new information, just some details on what @bemace already mentioned.
No, lexer rules cannot have return values. See 4.3 Rules from The Definitive ANTLR reference:
There are two options:
Option 1
You can do the transforming to a
longin the parser rulenumber:Option 2
Or create your own token that has, say, a
toLong(): longmethod:and define in the
options {...}header in your grammar to use this token and override theemit(): Tokenmethod in your lexer class:When you generate a parser and lexer, and run this test class:
it will produce the following output:
The first options seems the more practical in your case.
Note that, as you can see, the examples given are in Java. I have no idea if option 2 is supported in the C target/runtime. I decided to still post it to be able to use it as a future reference here on SO.