I am trying to create a grammar which accepts any character or number or just about anything, provided its length is equal to 1.
Is there a function to check the length?
EDIT
Let me make my question more clear with an example.
I wrote the following code:
grammar first;
tokens {
SET = 'set';
VAL = 'val';
UND = 'und';
CON = 'con';
ON = 'on';
OFF = 'off';
}
@parser::members {
private boolean inbounds(Token t, int min, int max) {
int n = Integer.parseInt(t.getText());
return n >= min && n <= max;
}
}
parse : SET expr;
expr : VAL('u'('e')?)? String |
UND('e'('r'('l'('i'('n'('e')?)?)?)?)?)? (ON | OFF) |
CON('n'('e'('c'('t')?)?)?)? oneChar
;
CHAR : 'a'..'z';
DIGIT : '0'..'9';
String : (CHAR | DIGIT)+;
dot : .;
oneChar : dot { $dot.text.length() == 1;} ;
Space : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
I want my grammar to do the following things:
- Accept commands like: ‘set value abc’ , ‘set underli on’ , ‘set conn #’. The grammar should be intelligent enough to accept incomplete words like ‘underl’ instead of ‘underline. etc etc.
- The third syntax: ‘set connect oneChar’ should accept any character, but just one character. It can be a numeric digit or alphabet or any special character. I am getting a compiler error in the generated parser file because of this.
- The first syntax: ‘set value’ should accept all the possible strings, even on and off. But when I give something like: ‘set value offer’, the grammar is failing. I think this is happening because I already have a token ‘OFF’.
In my grammar all the three requirements I have listed above are not working fine. Don’t know why.
There are some mistakes and/or bad practices in your grammar:
#1
The following is not a validating predicate:
A proper validating predicate in ANTLR has a question mark at the end, and the inner code has no semi colon at the end. So it should be:
instead.
#2
You should not be handling these alternative commands:
in a parser rule. You should let the lexer handle this instead. Something like this will do it:
(also see #5!)
#3
Your lexer rules:
are making things complicated for you. The lexer can produce three different kind of tokens because of this:
CHAR,DIGITorString. Ideally, you should only createStringtokens since aStringcan already be a singleCHARorDIGIT. You can do that by adding thefragmentkeyword before these rules:There will now be no
CHARandDIGITtokens in your token stream, onlyStringtokens. In short:fragmentrules are only used inside lexer rules, by other lexer rules. They will never be tokens of their own (and can therefor never appear in any parser rule!).#4
The rule:
does not do what you think it does. It matches “any token”, not “any character”. Inside a lexer rule, the
.matches any character but in parser rules, it matches any token. Realize that parser rules can only make use of the tokens created by the lexer.The input source is first tokenized based on the lexer-rules. After that has been done, the parser (though its parser rules) can then operate on these tokens (not characters!!!). Make sure you understand this! (if not, ask for clarification or grab a book about ANTLR)
– an example –
Take the following grammar:
The parser rule
pwill now match any token that the lexer produces: which is only aA– orB-token. So,pcan only match one of the characters'a','A','b'or'B', nothing else.And in the following grammar:
the lexer rule
BARmatches any single character in the range\u0000 .. \uFFFF, but it can never match the character'a'since the lexer ruleFOOis defined before theBARrule and captures this'a'already. And the parser ruleprsagain matches any token, which is eitherFOOorBAR.#5
Putting single characters like
'u'inside your parser rules, will cause the lexer to tokenize anuas a separate token: you don’t want that. Also, by putting them in parser rules, it is unclear which token has precedence over other tokens. You should keep all such literals outside your parser rules and make them explicit lexer rules instead. Only use lexer rules in your parser rules.So, don’t do:
but do:
You could make
':'a lexer rule, but that is of less importance. The'u'however can also be aStringso it must appear as a lexer rule before theStringrule.Okay, those were the most obvious things that come to mind. Based on them, here’s a proposed grammar:
that can be tested with the following class:
which, after generating the lexer and parser:
prints the following output:
As you can see, the last command,
C :: expr = conn xy, produces an error, as expected.