I’m looking for a grammar for analyzing two type of sentences, that
means words separated by white spaces:
- ID1: sentences with words not beginning with numbers
- ID2: sentences with words not beginning with numbers and numbers
Basically, the structure of the grammar should look like
ID1 separator ID2
ID1: Word can contain number like Var1234 but not start with a number
ID2: Same as above but 1234 is allowed
separator: e. g. '='
@Bart
I just tried to add two tokens '_' and '"' as lexer-rule Special for later use in lexer-rule Word.
Even I haven’t used Special in the following grammar, I get the following error in ANTLRWorks 1.4.2:
The following token definitions can never be matched because prior tokens match the same input: Special
But when I add fragment before Special, I don’t get that error. Why?
grammar Sentence1b1;
tokens
{
TCUnderscore = '_' ;
TCQuote = '"' ;
}
assignment
: id1 '=' id2
;
id1
: Word+
;
id2
: ( Word | Int )+
;
Int
: Digit+
;
// A word must start with a letter
Word
: ( 'a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit )*
;
Special
: ( TCUnderscore | TCQuote )
;
Space
: ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
;
fragment Digit
: '0'..'9'
;
Lexer-rule Special shall then be used in lexer-rule Word:
Word
: ( 'a'..'z' | 'A'..'Z' | Special ) ('a'..'z' | 'A'..'Z' | Special | Digit )*
;
I’d go for something like this:
which will parse the input:
as follows:
EDIT
To keep lexer rule nicely packed together, I’d keep them all at the bottom of the grammar instead of partly in the
tokens { ... }block, which I only use for defining “imaginary tokens” (used in AST creation):Now, with the rules above,
TCUnderscoreandTCQuotecan never become a token because when the lexer stumbles upon a_or", aSpecialtoken is created. Or in this case:the
Specialtoken can never be created because the lexer would first createTCUnderscoreandTCQuotetokens. Hence the error:If you make
TCUnderscoreandTCQuoteafragmentrule, you don’t have that problem becausefragmentrules only “serve” other lexer rules. So this works:Also,
fragmentrules can therefor never be “visible” in any of your parser rules (the lexer will never create aTCUnderscoreorTCQuotetoken!).