I have a column TypeCode varchar(20) that has values like ‘1’, ‘2’, ‘3’, ‘FOO’, ‘BAR’. I need to find the maximum integer which is less than the value of parameter. Something like this:
select max(TypeCode) TypeCode
from table1 a
left join table2 b on b.table1id = a.id
and b.TypeCode not in ('FOO', 'BAR')
where b.TypeCode < @MaxType
Which works most of the time, but in some queries SQL Server decides to convert it to something like this (according to the query plan).
select max(TypeCode) TypeCode
from table1 a
left join table2 b on b.table1id = a.id
and b.TypeCode < @MaxType
and b.TypeCode not in ('FOO', 'BAR')
That query obviously produces the following error:
Conversion failed when converting the varchar value 'FOO' to data type int.
I tried creating a view of table2 without the ‘FOO’ and ‘BAR’ values and joining the view instead but the query plan is still the same.
Do you know of a way to prevent the optimizer from changing the query?
PS: I know the design of the table is not the best but this is a legacy database and I can’t change it.
The real problem here is that you are combining character and integer semantics in the same query.
There are three possible solutions that I can think of:
Change the
@MaxTypeparameter to be avarchar(9). If you are only comparing against single-digit numbers, alphabetical order is OK. Otherwise, this won’t work.Use a
CASEorISNULL(NULLIF(...))construct in theWHEREstatement. This will work, but it is not sargable and will cause the optimizer to ignore any indexes you have on the type code. Not great.Created a persisted, computed, nullable integer column (i.e.
TypeCodeID) and index it separately. Put theCASEas the column’s expression. This will take some extra data/index space, but if you want good performance, that is the best way to go. Then instead of writingNOT IN ('Foo', 'Bar')you can just write the first condition (TypeCode < @MaxType) because the rows withFooandBarin theTypeCodecolumn will haveNULLin the newTypeCodeIDcolumn.I suppose there is also a fourth answer which is to change your design, and this would be the best idea, if it’s possible. If a column can contain character data then you really shouldn’t be trying to perform numerical comparisons. I have a strong suspicion that the character data exists in this column because it’s coming from users and isn’t validated/sanitized properly, so it’s got garbage values like
N/AorUnknownin there. If that is the case, your DB really should be enforcing data integrity, that’s what a DBMS is for. I know you say you “can’t change it”, but I don’t think this answer would be complete if I didn’t recommend against such dangerous practices.