With Java Grammar for ANTLR, I could read a java code and print out the tokens sequentially.
String filePath = JAVA_SOURCE;
String input = readFileAsString(filePath);
//ANTLRStringStream in = new ANTLRStringStream(input);
InputStream inputStream = new FileInputStream(filePath);
ANTLRInputStream in = new ANTLRInputStream(inputStream);
Java6Lex lexer = new Java6Lex(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
while(true) {
int val = tokens.LA(1);
tokens.consume();
if (val == -1) {
break;
}
System.out.printf("%d ", val);
}
59 54 38 54 81 61 92 59 54 38 54 96 61 92 59 54 38 54 81 54 92 90 54 92 …
How can I map each tokens back to the position in the JAVA_SOURCE? Does ANTLR have a counter or something?
By default, ANTLR produces
CommonTokens. Read the full API here: http://www.antlr.org/api/Java/org/antlr/runtime/CommonToken.htmlHere’s a demo to print some information about the tokens the Java6 parser encounters:
Java6.g(and name the grammar “Java6”, of course!)Java6.gCopy-paste the following in your
Java6.gfile:Now run the Java6 file:
and you will see the following being printed to your console:
package test; public class Test { int n = 42; } type=PACKAGE, text='package', line=1, startIndex=0, charPositionInLine=0 type=IDENTIFIER, text='test', line=1, startIndex=8, charPositionInLine=8 type=SEMI, text=';', line=1, startIndex=12, charPositionInLine=12 type=PUBLIC, text='public', line=3, startIndex=15, charPositionInLine=0 type=CLASS, text='class', line=3, startIndex=22, charPositionInLine=7 type=IDENTIFIER, text='Test', line=3, startIndex=28, charPositionInLine=13 type=LBRACE, text='{', line=3, startIndex=33, charPositionInLine=18 type=INT, text='int', line=5, startIndex=38, charPositionInLine=2 type=IDENTIFIER, text='n', line=5, startIndex=42, charPositionInLine=6 type=EQ, text='=', line=5, startIndex=44, charPositionInLine=8 type=INTLITERAL, text='42', line=5, startIndex=46, charPositionInLine=10 type=SEMI, text=';', line=5, startIndex=48, charPositionInLine=12 type=RBRACE, text='}', line=6, startIndex=50, charPositionInLine=0And if you’re looking for a way to get tokens from parser rules, every parser rule has a
startandstopmember in its ParserRuleReturnScope that can be cast to a CommonToken.