I’m trying to implement a grammar for parsing queries. Single query consists of items where each item can be either name or name-ref.
name is either mystring (only letters, no spaces) or "my long string" (letters and spaces, always quoted). name-ref is very similar to name and the only difference is that it should start with ref: (ref:mystring, ref:"my long string"). Query should contain at least 1 item (name or name-ref).
Here’s what I have:
NAME: ('a'..'z')+;
REF_TAG: 'ref:';
SP: ' '+;
name: NAME;
name_ref: REF_TAG name;
item: name | name_ref;
query: item (SP item)*;
This grammar demonstrates what I basically need to get and the only feature is that it doesn’t support long quoted strings (it works fine for names that doesn’t have spaces).
SHORT_NAME: ('a'..'z')+;
LONG_NAME: SHORT_NAME (SP SHORT_NAME)*;
REF_TAG: 'ref:';
SP: ' '+;
Q: '"';
short_name: SHORT_NAME;
long_name: LONG_NAME;
name_ref: REF_TAG (short_name | (Q long_name Q));
item: (short_name | (Q long_name Q)) | name_ref;
query: item (SP item)*;
But that doesn’t work. Any ideas what’s the problem? Probably, that’s important: my first query should be treated as 3 items (3 names) and "my first query" is 1 item (1 long_name).
ANTLR’s lexer matches greedily: that is why input like
my first queryis being tokenized asLONG_NAMEinstead of 3SHORT_NAMEs with spaces in between.Simply remove the
LONG_NAMErule and define it in the parser rulelong_name.The following grammar:
will parse the input:
as follows:
However, you could also tokenize a quoted name in the lexer and strip the quotes from it with a bit of custom code. And removing spaces from the lexer could also be an option. Something like this:
which would parse the same input as follows:
Note that the actual token
LONG_NAMEwill be stripped of its start- and end-quote.