I have been working on learning ANTLR in order to create a domain specific language. One of the requirements is to translate this DSL into C. I have been able to get a basic grammar that recognizes the DSL, however I am having issues translating this to C. Mainly, my problem comes from trying to translate the DSL if statement into a C if statement. I have tried using print statements in the grammar, to no avail (I am using C#).
Here is the grammar I have been testing with:
**ifTest.g**
grammar ifTest;
options
{
backtrack=true;
output=AST;
language=CSharp2;
}
/*************************
PARSER RULES
*************************/
prog : lambda
| statements EOF;
lambda : /* Empty */;
statements
: statement+;
statement
: logical
| assignment
| NEWLINE;
logical : IF a=logical_Expr THEN b=statements
{
System.Console.Write("\tif (" + $a.text + ")\n\t{\n\t" + "\t" + $b.text + "\n\n\t}");
}
( ELSE c=statements
{
System.Console.Write("\n\telse {\n\t\t\t" + $c.text + "\n\t}");
} )?
ENDIF
{
System.Console.Write("\n}");
}
;
logical_Expr
: expr
;
expr : (simple_Expr) (op expr)*
;
simple_Expr : MINUS expr
| identifier
| number
;
identifier : parameter
| VARIABLE
;
parameter : norm_parameter
;
norm_parameter : spec_label
| reserved_parm
;
spec_label : LABEL
;
reserved_parm : RES_PARM
;
op : PLUS
| MINUS
| MULT
| DIV
| EQUALS
| GT
| LT
| GE
| LE
;
number : INT
| FLOAT
| HEX
;
assignment : identifier GETS expr
;
/*************************
LEXER RULES
*************************/
WS : (' '|'\t')+ {$channel=HIDDEN;};
COMMENT : '/*' (options {greedy=false;}:.)* '*/' {$channel=HIDDEN;}
;
LINECOMMENT
: '#' ~('\n'|'\r')* NEWLINE {$channel=HIDDEN;}
;
NEWLINE : '\r'?'\n' {$channel=HIDDEN;};
IF : I F;
THEN : T H E N;
ELSE : E L S E;
ENDIF : E N D I F;
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
EQUALS : '=';
GT : '>';
LT : '<';
GE : '>=';
LE : '<=';
ULINE : '_';
DOT : '.';
GETS : ':=';
LABEL : (LETTER|ULINE)(LETTER|DIGIT|ULINE)*;
INT : '-'?DIGIT+;
FLOAT : '-'? DIGIT* DOT DIGIT+;
HEX : ('0x'|'0X')(HEXDIGIT)HEXDIGIT*;
RES_PARM: DIGIT LABEL;
VARIABLE: '\$' LABEL;
fragment A:'A'|'a'; fragment B:'B'|'b'; fragment C:'C'|'c'; fragment D:'D'|'d';
fragment E:'E'|'e'; fragment F:'F'|'f'; fragment G:'G'|'g'; fragment H:'H'|'h';
fragment I:'I'|'i'; fragment J:'J'|'j'; fragment K:'K'|'k'; fragment L:'L'|'l';
fragment M:'M'|'m'; fragment N:'N'|'n'; fragment O:'O'|'o'; fragment P:'P'|'p';
fragment Q:'Q'|'q'; fragment R:'R'|'r'; fragment S:'S'|'s'; fragment T:'T'|'t';
fragment U:'U'|'u'; fragment V:'V'|'v'; fragment W:'W'|'w'; fragment X:'X'|'x';
fragment Y:'Y'|'y'; fragment Z:'Z'|'z';
fragment DIGIT
: '0'..'9';
fragment LETTER
: A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z;
fragment HEXDIGIT
: '0..9'|'a..f'|'A'..'F';
When testing this with this C# class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Antlr.Runtime;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string inputString = "if $variable1 = 0 then\n if $variable2 > 250 then\n $variable3 := 0\n endif\n endif";
Console.WriteLine("Here is the input string:\n " + inputString + "\n");
ANTLRStringStream input = new ANTLRStringStream(inputString);
ifTestLexer lexer = new ifTestLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ifTestParser parser = new ifTestParser(tokens);
parser.prog();
Console.Read();
}
}
}
The output is not quite how I imagined.
**Output**
if ($variable2 > 250)
{
$variable3 := 0
}
} if ($variable1 = 0)
{
if $variable2 > 250 then
$variable3 := 0
endif
}
}
The problem seems to be that the second if statement is printing twice, but not in the order I was hoping. I assume it has to do with me simply trying to emit the statements block within the print statements, but I am not quite sure how to go about getting this to work properly. I have been reading up on StringTemplate, or creating an AST and using a Tree Walker to walk it, but is there anyway to fix the above output to look something like this?
if ($variable1 = 0)
{
if ($variable2 > 250)
{
$variable3 := 0
}
}
Any help on which direction I should be taking would be greatly appreciated. Would it be better for me to take the leap to StringTemplate, or is there some way for me to do this with basic action code? If I left any information out, please feel free to ask.
If you remove the backtracking, which is easily done in your case, you can let the parser build the C code immediately.
Note that parser rules can take parameters (the indentation level in my example below) and can return custom objects (
Strings in the example):Here’s your grammar without backtracking and outputting to C code (I’m not too good at C#, so the demo is in Java):
If you now test your parser with the input:
the following is printed to the console:
If other parts of your grammar rely (heavily) on predicates (backtracking), the same strategy as above could just as easily be applied but then in a tree grammar (so after the backtracking-parser did its job and produced an AST).