The current project I’m working on uses an Oracle DBMS to store data. During development I found that Date information is not stored in a Date field, but in a VARCHAR2 column with some weird formatting. For example, look at this table:
CREATE TABLE "A_TABLE"
(
"OSERC_FEC_INICIO_OS" VARCHAR2(14 BYTE),
"OSERC_FEC_FIN_OS" VARCHAR2(14 BYTE),
"OSERC_FEC_REGISTRO_PETICION" VARCHAR2(14 BYTE),
"OSERC_FEC_APROBACION_PETICION" VARCHAR2(14 BYTE),
"OSERC_FEC_LIQUIDACION_OS" VARCHAR2(14 BYTE),
"OSERC_FEC_EJECUCION_OS" VARCHAR2(14 BYTE),
)
The fields OSERC_FEC_REGISTRO_PETICION, OSERC_FEC_APROBACION_PETICION, OSERC_FEC_LIQUIDACION_OS and OSERC_FEC_EJECUCION_OS stores date information but are declared as VARCHAR2 columns. If you check the data, you’ll see that they use the format YYYYMMDDHHMMSS to store that information.
I’m concerned because I need to build queries that uses this dates in the WHERE clause, and I’m not sure what will be the index performance with that approach. So, what are the problems involved in the design I mentioned? It would be better is the date fields where NUMBER instead of VARCHAR2?
It would be much better if the dates were stored as dates. Storing them as numbers rather than strings introduces a different set of problems.
If you are absolutely stuck with dates stored as strings, in order to allow indexes on the columns to be used, you’d need to convert the dates you’re using as parameters as strings in the appropriate format and then rely on the fact that sorting of strings in that particular format matches the expected sort order of actual dates. If you ever compare the string to date or to a number, you’re going to get implicit data type conversion which, at best, will lead to performance problems because indexes cannot be used and at worst will generate incorrect results or errors.
Assuming you avoid data type conversion, the performance issues are likely to arise from the fact that the optimizer has a great deal of difficulty estimating cardinality when you use the wrong data type. Oracle knows, for example, that there are 365 days (or 8760 hours or 525600 minutes) between 1/1/2012 and 1/1/2013. On the other hand, there are billions of possible strings between ‘20120101000000’ and ‘20130101000000’. That can cause the optimizer not to use an index when you would like it to (or vice versa), to use the wrong sort of join, etc.