I’m trying to implement a grammar for parsing queries. Single query consists of items

Question

0

Asked: May 28, 20262026-05-28T15:03:11+00:00 2026-05-28T15:03:11+00:00

I’m trying to implement a grammar for parsing queries. Single query consists of items

0

I’m trying to implement a grammar for parsing queries. Single query consists of items where each item can be either name or name-ref.

name is either mystring (only letters, no spaces) or "my long string" (letters and spaces, always quoted). name-ref is very similar to name and the only difference is that it should start with ref: (ref:mystring, ref:"my long string"). Query should contain at least 1 item (name or name-ref).

Here’s what I have:

NAME: ('a'..'z')+;
REF_TAG: 'ref:';
SP: ' '+;

name: NAME;
name_ref: REF_TAG name;
item: name | name_ref;
query: item (SP item)*;

This grammar demonstrates what I basically need to get and the only feature is that it doesn’t support long quoted strings (it works fine for names that doesn’t have spaces).

SHORT_NAME: ('a'..'z')+;
LONG_NAME: SHORT_NAME (SP SHORT_NAME)*;
REF_TAG: 'ref:';
SP: ' '+;
Q: '"';

short_name: SHORT_NAME;
long_name: LONG_NAME;
name_ref: REF_TAG (short_name | (Q long_name Q));
item: (short_name | (Q long_name Q)) | name_ref;
query: item (SP item)*;

But that doesn’t work. Any ideas what’s the problem? Probably, that’s important: my first query should be treated as 3 items (3 names) and "my first query" is 1 item (1 long_name).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T15:03:13+00:00

ANTLR’s lexer matches greedily: that is why input like my first query is being tokenized as LONG_NAME instead of 3 SHORT_NAMEs with spaces in between.

Simply remove the LONG_NAME rule and define it in the parser rule long_name.

The following grammar:

SHORT_NAME : ('a'..'z')+;
REF_TAG    : 'ref:';
SP         : ' '+;
Q          : '"';

short_name : SHORT_NAME;
long_name  : Q SHORT_NAME (SP SHORT_NAME)* Q;
name_ref   : REF_TAG (short_name | (Q long_name Q));
item       : short_name | long_name | name_ref;
query      : item (SP item)*;

will parse the input:

my first query "my first query" ref:mystring

as follows:

enter image description here

However, you could also tokenize a quoted name in the lexer and strip the quotes from it with a bit of custom code. And removing spaces from the lexer could also be an option. Something like this:

SHORT_NAME : ('a'..'z')+;
LONG_NAME  : '"' ~'"'* '"' {setText(getText().substring(1, getText().length()-1));};
REF_TAG    : 'ref:';
SP         : ' '+ {skip();};

name_ref   : REF_TAG (SHORT_NAME | LONG_NAME);
item       : SHORT_NAME | LONG_NAME | name_ref;
query      : item+ EOF;

which would parse the same input as follows:

enter image description here

Note that the actual token LONG_NAME will be stripped of its start- and end-quote.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to implement a grammar for parsing queries. Single query consists of items

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply