I’m working on a query with a varchar column called ALCOHOL_OZ_PER_WK. Part of the query includes:
where e.ALCOHOL_OZ_PER_WK >= 14
and get the errors:
Arithmetic overflow error converting varchar to data type numeric.
and:
Error converting data type varchar to numeric.
Looking into the values actually stored in the column, the largest look close to 100, but some of the entries are ranges:
9 - 12
1.5 - 2.5
I’d like to get the upper limit (or maybe the midpoint of the range) from rows with entries like this and have it be the value being compared to 14.
What would be the (or an) easy way to do this?
As always, thank you!
Your DB is obviously result of some survey, and it seems to contain the original survey data. The usual way is to run this through an ECCD (Extract, Clean, Conform, Deliver) process and store clean and standardized data into a separate database (maybe a warehouse) which can then be used for analytics and reporting.
If you have SSIS use data profiling task to get an idea of types of strings you have in there. The Column Pattern Profile reports a set of regular expressions on the string column, so you will get an idea of what’s inside those strings. If you do not have SSIS, you can use eobjects DataCleaner to do the same.
If you can not spare a new database or at least a new table — at minimum add a numeric column to this table and then extract numeric values form those strings into the new column. You may want to use “something else” (SSIS, Pentaho Kettle, Python, VB, C#) to do this — in general T-SQL in not very good at string processing.
My guess is that this is not the only column that has garbage inside, so any analysis that you may run on this may be worthless.
And if you still think that the ranges are the only problem, this example may help:
First some data
The query splits records into two groups, numeric and not numeric — assumed ranges.
Returns: