I have a script language based on Antlr: A parser and a tree grammar that builds runtime objects (e.g. statements). When I deal with the statements at runtime, I want to know the original source positions (e.g. when I throw errors, I want to state the line and position in the script source.)
What is the best strategy to attach the source positions to my runtime objects? And if I’m not asking too much, I want to have as little impact on my grammar files as possible.
I have tried to put as little code into the grammar as possible to increase quality, e.g. one of my (many) expressions looks like this:
multiplyExpression returns [Expression value]
: ^('*' l=expression r=expression)
{
$value = sb.newBinaryExpression(CorIdentifier.MULTIPLY, $l.value, $r.value);
}
;
where sbis my ScriptBuilder that acts as an adapter between the generated code and my runtime. I know I can add the source position as an additional parameter to newBinaryExpressionbut then I have to touch all other expressions as well. I was hoping that I can put the token stream into sb only once and fetch the source position from the stream without affecting the grammar source at all.
I was hoping that, since Antlr is used by many scripting languages, there is a standard way to handle this since source position handling is a single aspect and I don’t want to have it cluttered all over the grammar file, not very DRY.
You make it sound like ANLTR does not support this. Sure there is: every
CommonTokenandCommonTreeobjects exposes publicgetLine()andgetCharPositionInLine()methods, but you discard these instances and create your own nodes (Expression). Don’t be surprised to make some extra effort in embedding this information in your own nodes 🙂You could let your runtime objects extend
CommonTreeclasses and let your (combined) grammar construct these custom runtime objects (your classes now inherit thegetLine()andgetCharPositionInLine()methods). See: Using custom AST node types.