The grammar below parses ( left part = right part # comment ), # comment is optional.
Two questions:
- Sometimes warning (ANTLRWorks 1.4.2):
Decision can match input such as “{Int, Word}” using multiple alternatives: 1, 2 (referencingid2)
But only sometimes! - The next extension should be that the comment (
id2) can contain chars ‘(‘ and ‘)’.
The grammar:
grammar NestedBrackets1a1;
//==========================================================
// Lexer Rules
//==========================================================
Int
: Digit+
;
fragment Digit
: '0'..'9'
;
Special
: ( TCUnderscore | TCQuote )
;
TCListStart : '(' ;
TCListEnd : ')' ;
fragment TCUnderscore : '_' ;
fragment TCQuote : '"' ;
// A word must start with a letter
Word
: ( 'a'..'z' | 'A'..'Z' | Special ) ('a'..'z' | 'A'..'Z' | Special | Digit )*
;
Space
: ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
;
//==========================================================
// Parser Rules
//==========================================================
assignment
: TCListStart id1 '=' id1 ( comment )? TCListEnd
;
id1
: Word+
;
comment
: '#' ( id2 )*
;
id2
: ( Word | Int )+
;
No, the grammar you posted will always produce this warning. Perhaps you don’t always notice it (your IDE-plugin or ANTLRWorks might show it in a tab you don’t have opened), but the warning is there. Convince yourself by creating a lexer/parser from the command line:
will produce:
This is because you have a
*after( id2 )inside yourcommentrule, andid2also is a repetition of tokens:( Word | Int )+. Let’s say your input is"# foo bar"(a#followed by twoWordtokens). ANTLR can now parse the input in more than 1 way: the 2 tokens"foo"and"bar"could be matched by( id2 )*, whereid2matches a singleWordtoken at a time, but"foo"and"bar"could also be matches in one go of theid2rule.Look at the merged rules:
See how you’re repeating a repetition:
( ( ... )+ )*? This is usually a problem, as it is in your case.Resolve this problem by either replacing the
*with a?:or by removing the
+:That is asking for trouble since a comment is followed by a
TCListEnd, which is a). I don’t recommend letting a comment match).EDIT
Note that comments are usually stripped from the source file while tokenizing the input source. That way you don’t need to account for them in your parser rules. You can do that by “skipping” these tokens in a lexer rule: