I created a simple grammar in AntlWorks. Then I generated code and I have two files: grammarLexer.java and grammarParser.java. My goal is to create mapping my grammar to java language. What should I do next to achieve it?
Here is my grammar:
`
grammar grammar;
prog : ((FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | VARIABLE) | FUNCTION_DEC)+;
FOR : WS* 'for' WS+ VARIABLE WS+ DIGIT+ WS+ DIGIT+ WS* ENTER ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER;
WHILE : WS* 'while' WS+ (VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))* WS* 'end' WS* ENTER;
IF : WS* 'if' WS+ ( FUNCTION | VARIABLE | DIGIT+) WS* EQ_OPERATOR WS* (VARIABLE | DIGIT+) WS* ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC)* ( WS* 'else' ENTER (FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | (WS* INC_DEC))*)? WS* 'end' WS* ENTER;
CHAR : ('a'..'z'|'A'..'Z')+;
EQ_OPERATOR : ('<' | '>' | '==' | '>=' | '<=' | '!=');
DIGIT : '0'..'9'+;
ENTER : '\n';
WS : ' ' | '\t';
PRINT_TEMPLATE : WS+ (('"' (CHAR | DIGIT | WS)* '"') | VARIABLE | DIGIT+ | FUNCTION | INC_DEC);
PRINT : WS* 'print' PRINT_TEMPLATE (',' PRINT_TEMPLATE)* WS* ENTER;
VARIABLE : CHAR(CHAR|DIGIT)*;
FUN_TEMPLATE : WS* (VARIABLE | DIGIT+ | '"' (CHAR | DIGIT | WS)* '"');
FUNCTION : VARIABLE '(' (FUN_TEMPLATE (WS* ',' FUN_TEMPLATE)*)? ')' WS* ENTER*;
DECLARATION : WS* VARIABLE WS* ('=' WS* (DIGIT+ | '"' (CHAR | DIGIT | WS)* '"' | VARIABLE)) WS* ENTER;
FUNCTION_DEC : WS*'def' WS* FUNCTION ( FOR | WHILE | IF | PRINT | DECLARATION | ENTER | (WS* FUNCTION) | INC_DEC )* WS* 'end' WS* ENTER*;
INC_DEC : VARIABLE ('--' | '++') WS* ENTER*;`
Here is my Main class for parser:
`
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonToken;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.Parser;
public class Main {
public static void main(String[] args) throws Exception {
// the input source
String source =
"for i 1 3\n " +
"printHi()\n " +
"end\n " +
"if fun(y, z) == 0\n " +
"end\n ";
// create an instance of the lexer
grammarLexer lexer = new grammarLexer(new ANTLRStringStream(source));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// traverse the tokens and print them to see if the correct tokens are created
int n = 1;
for(Object o : tokens.getTokens()) {
CommonToken token = (CommonToken)o;
System.out.println("token(" + n + ") = " + token.getText().replace("\n", "\\n"));
n++;
}
grammarParser parser = new grammarParser(tokens);
parser.file();
}
}
`
As I already mentioned in comments: your overuse of lexer rules is wrong. Look at lexer rules as being the fundamental building blocks of your language. Much like how you’d describe water in chemistry. You would not describe water like this:
I.e.: as a single element. Water should be described as 3 separate elements:
where
HydrogenandOxygenare the fundamental building blocks (lexer rules) andwateris the compound (the parser rule).A good rule of thumb is that if you’re creating lexer rules that consist of several other lexer rules, chances are there’s something fishy in your grammar. This is not always the case, of course.
Let’s say you want to parse the following input:
A grammar could look like this:
And if you now run this test class:
you’ll see some output being printed to the console which corresponds to the following AST: