Background
- Microsoft SQL Server 2008 R2
- Table with ~100k records per day
- Most queries querying this table filter by the stated column
Problem
So, to add a little performance improvement to the database, an option is add an index on a date column, but instead of storing the date as a date type, store it as an integer using the following format:
ddMMyyyy
**Edit: Changed the format to yyyyMMdd after looking at comments**
Question
- Do you think this is a good idea?
- Would you gain any improvements by doing this?
- Any possible drawbacks?
We are still in the design phase so we still have time to change this if we feel like it.
We expect to have a lot of queries doing filters by this column, but IMO this wont give any performance improvements, it would be the same to have a Date column without specifying the time in it.
If you are trying to improve performance on the table, adding an additional column is a suspicious way to begin.
First, if the table already has a date column, then use that. Date is 4-bytes, so it is the same size as an integer. More importantly, it gives you all sorts of date functionality built into the database — getting the month name, ordering by dates, calculating the number of days between dates, and so on.
Building an index is one approach for improving performance. I would also suggest that you look into partitioning the table. You probably don’t need to break up the table by day, but breaking it up by month would produce reasonably sized partitions (about 3 million rows).
In fact, if the querying is all on recent data, then I might suggest that you create a history table, which can be queried at leasure. Then keep the most recent data in “current” table. You can have a process that runs every day to remove the oldest day of data from the current data and to put the rows in the history table.
In any case, as the comments suggest, the format
ddMmyyyyis an unreasonable format. It works for equality, but not forbetweenororder by.