I seem to be struggling with the AST->StringTemplate side of things, probably cause I’m coming from writing parsers by hand -> LLVM.
What I’m looking for is a way to automatically match up a parsing rule to an AST class that can represent it and contains a method to generate the target language output. (probably using StringTemplate, in this case.)
In pseudo code, given this example grammar:
numberExpression
: DIGIT+
;
I want to have it mapped to this AST class:
class NumberExpressionAST extends BaseAST {
private double value;
public NumberExpressionAST(node) {
this.value = node.value;
}
public String generateCode() {
// However we want to generate the output.
// Maybe use a template, maybe string literals, maybe cure cancer...whatever.
}
}
To mate them up, maybe there would be some glue like below: (or you could go crazy with Class.forName stuff)
switch (child.ruleName) {
case 'numberExpression':
return new NumberExpressionAST(child);
break;
}
I’ve been scouring the web and I found parse rewrite rules in the grammar with -> but I can’t seem to figure out how to keep all this logic out of the grammar. Especially the code to setup and generate the target output from the template. I’m OK with walking the tree multiple times.
I thought that maybe I could use the option output=AST and then maybe provide my own AST classes extending from the CommonTree? I’ll admit, my grasp on ANTLR is very primitive, so forgive my ignorance. Every tutorial I follow shows doing all this stuff inline with the grammar which to me is totally insane and hard to maintain.
Can someone point me to a way of accomplishing something similar?
Goal: keep AST/codegen/template logic out of the grammar.
EDIT ———————————————
I’ve resorted to tracing through ANTLR’s actual source code (since they use themselves) and I’m seeing similar things like BlockAST, RuleAST, etc all inheriting from CommonTree. I haven’t quite figured out the important part…how they’re using them..
From looking around, I noticed you can basically type hint tokens:
identifier
: IDENTIFIER<AnyJavaClassIWantAST>
;
You can’t do exactly the same for parse rules…but if you create some token to represent the parse rule as a whole, you can use rewrite rules like so:
declaration
: type identifier -> SOME_PARSE_RULE<AnyJavaClassIWantAST>
;
All this is closer to what I want, but ideally I shouldn’t have to litter the grammar…is there any way to put these somewhere else?
Here is a contrived example that uses a handful of ANTLR4’s features that go a long way towards separating the grammar from the output language, mainly the alternative labels and the generated listener. This example grammar can represent a few trivial bits of code, but it does so with no language references — not even a call to
skip()for whitespace in the lexer. The test class converts the input to some Java output using the generated listener.I avoided using anything that I couldn’t get to work on the first couple of tries, so don’t consider this an exhaustive example by any means.
Simplang.g
Along with the lexer and parser, ANTLR4 generates a listener interface and default empty implementing class. Here’s the interface generated for the grammar above.
SimplangListener.java
Here’s a test class that overrides a few methods in the empty listener and calls the parser.
SimplangTest.java
Here’s the test input hard-coded in the test class:
Here’s the output produced:
It’s a silly example, but it shows a few of the features that might be useful to you when building a custom AST.