I am trying to develop a mini DSL for software configuration, using antlworks for prototyping. A typical source would look like:
name: myname; value: myvalue; flag debug { value = debugvalue; } if flag(debug) { libname = foo_d; } else { libname = foo; }
Now, I never got a formal course on parsing, so I am doing all this by trial/error from antlworks and some basics on BNF grammars. One constant problem I encounter is whitespace and newline handling. I defined something like
program: statement* EOF; statement: compound_statement | selection_statement | field_statement; selection_statement: 'if' expr statement; statement_list: (WS* statement)+; compound_statement: '{' statement_list? '}'; field_statement: name_statement | value_statement; name_statement: 'name' WS* ':' WS* WORD WS* ';'; value_statement: 'value' WS* ':' WS* WORD WS* ';'; // Tokens WS : (' ' | '\t' | '\n'); WORD: ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
But the whitespace handling is very buggy, it breaks for all kind of cases. What it the standard way of doing this ? Is there any resource to learn this kind of things quickly (something like building a calculator with conditional and variables in antlr – the antlr grammars I found are either trivial and full-fledge languages).
Usually, you would do this by adding
action to the
WSrule; see this page, section Lexer rules for details.