I am using ANTLR 3 to do the below.
Assume I have an SQL query. I know that in general it’s WHERE, ORDER BY and GROUP BY clauses are optional. In terms of ANTLR’s grammar I would describe that like this:
query : select_clause from_clause where_clause? group_by_clause? order_by_clause?
The rule for each clause will obviously start with the respective keyword.
What I actually need is to extract each clause’s contents as a string without dealing with its internal structure.
To do this I started with the following grammar:
query :
select_clause from_clause where_clause? group_by_clause? order_by_clause?
EOF;
select_clause :
SELECT_CLAUSE
;
from_clause :
FROM_CLAUSE
;
where_clause :
WHERE_CLAUSE
;
group_by_clause :
GROUP_BY_CLAUSE
;
order_by_clause :
ORDER_BY_CLAUSE
;
SELECT_CLAUSE : 'select' ANY_CHAR*;
FROM_CLAUSE : 'from' ANY_CHAR*;
WHERE_CLAUSE : 'where' ANY_CHAR*;
GROUP_BY_CLAUSE : 'group by' ANY_CHAR*;
ORDER_BY_CLAUSE : 'order by' ANY_CHAR*;
ANY_CHAR : .;
WS : ' '+ {skip();};
This one didn’t work. I have had further attempts composing a correct grammar with no success. I suspect this task is doable with ANTLR3 but I am just missing smth.
More generally, I would like to be able to collect chars from the input stream into a single token until meeting a specific keyword that would indicate the beginning of a new token. This keyword should be the part of the new token.
Can you help me please?
Instead of adding them to your tokens, why not move the
ANY_CHAR*into parser rules instead? You could even “glue” these single tokens together using a rewrite rule.A quick demo:
If you now parse the input:
the following AST would be created:
Trying to do something similar in your lexer would be messy, and would mean some custom code (or predicates) to check for keywords up ahead in the char-stream (both not pretty!).