I’m working on an SQL grammar in ANTLR which allows quoted identifiers (table names, field names, etc), as well as quoted literal strings.
The problem is that this grammar seems to always match quoted inputs as “QUOTED_LITERAL”, and never as IDs wrapped in quotes.
Here are my results:
- input: ‘blahblah’ result: string_literal as expected.
- input: field1 restul: column_name as expected
- input: table.field1 result: column_spec as expected
- input: ‘table’.’field1′ result: string_literal, MissingTokenException
Below is my simplified grammar for the expression portion of the SQL grammar, if anybody can help identify what is needed to match quoted rules other than the quoted literal, thanks.
grammar test;
expression
:
simpleExpression EOF!
;
simpleExpression
:
column_spec
| literal_value
;
column_spec
:
(table_name '.')? column_name
| ('\''table_name '\'''.')? '\'' column_name '\''
| ('\"'table_name '\"' '.')? '\"' column_name '\"'
;
string_literal: QUOTED_LITERAL ;
boolean_literal: 'TRUE' | 'FALSE' ;
literal_value :
(
string_literal
| boolean_literal
)
;
table_name :ID;
column_name :ID;
QUOTED_LITERAL:
( '\''
( ('\\' '\\') | ('\'' '\'') | ('\\' '\'') | ~('\'') )*
'\'' )
|
( '\"'
( ('\\' '\\') | ('\"' '\"') | ('\\' '\"') | ~('\"') )*
'\"' )
;
ID
:
( 'A'..'Z' | 'a'..'z' ) ( 'A'..'Z' | 'a'..'z' | '_' | '0'..'9'| '::' )*
;
WHITE_SPACE : ( ' '|'\r'|'\t'|'\n' ) {$channel=HIDDEN;} ;
In case anybody is interested, I removed a little bit of the flexibility from the quoted literal strings. Literal strings can only be quoted by single quotes, and identifiers can be optionally quoted by double quotes. As long as the literal quote and the identifier quote is well defined and they don’t overlap, the grammar is trivial.
This policy makes the grammar much cleaner, and doesn’t remove the ability to quote identifiers. I make use of the JDBC method getIdentifierQuote to report which quote can be used to wrap identifiers.