For my own interests I am writing an ANSI SQL Lexer. Specifically, I am trying to conform to ISO/IEC 9075-2:2003(E). I ran into a problem in the token stage with some ambiguity.
The lexical elements section define an interval string as follows:
<interval string> ::= <quote> <unquoted interval string> <quote>
<unquoted interval string> ::= [ <sign> ] { <year-month literal> | <day-time literal> }
<year-month literal> ::= <years value> [ <minus sign> <months value> ] | <months value>
<years value> ::= <datetime value>
<months value> ::= <datetime value>
<datetime value> ::= <unsigned integer>
<unsigned integer> ::= <digit>...
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Example:
’30’
Is the 30 a <years value> without the option
or is it a <months value>?
In theory I could write:
SELECT ’30’
I created a YearsValue token and a MonthsValue token (classes). However, the ambiguity is an issue, it matches both. I don’t see anything specifically dealing with multiple matches in part 1 or part 2 of ISO/IEC 9075.
Can someone point out where in the spec this is handled or is it just assumed left to right?
Before anyone asks, I am doing this because I want to write a SQL lexer. Its not for school its just something to educate myself. I don’t want to use GOLD or ANTLR either.
Based on my reading of a draft of SQL 2003, it is left ambiguous in a way that doesn’t matter. Yes, the grammar does not specify whether the
1inINTERVAL '1' YEARis a<years value>or a<months value>, or even perhaps a<days value>, but it really does not matter. The description of howYEARis interpreted is clear that1is a number of years, even if it is parsed as a<months value>. The standard says that the first component in the value is mapped to the first field type in the interval type: