I’m not a DBA (“Good!”, you’ll be thinking in a moment.)
I have a table of logging data with these characteristics and usage patterns:
- A
datetimecolumn for storing log timestamps whose value is ever-increasing and mostly (but only mostly) unique - Frequent-ish inserts (say, a dozen a minute), only at the end of the timestamp range (new data being logged)
- Infrequent deletes, in bulk, from the beginning of the timestamp range (old data being cleared)
- No updates at all
- Frequent-ish selects using the timestamp column as the primary criterion, along with secondary criteria on other columns
- Infrequent selects using other columns as the criteria (and not including the timestamp column)
- A good amount of data, but nowhere near enough that I’m worried much about storage space
Additionally, there is currently a daily maintenance window during which I could do table optimization.
I frankly don’t expect this table to challenge the server it’s going to be on even if I mis-index it a bit, but nevertheless it seemed like a good opportunity to ask for some input on SQL Server clustered indexes.
I know that clustered indexes determine the storage of the actual table data (the data is stored in the leaf nodes of the index itself), and that non-clustered indexes are separate pointers into the data. So in query terms, a clustered index is going to be faster than a non-clustered index — once we’ve found the index value, the data is right there. There are costs on insert and delete (and of course an update changing the clustered index column’s value would be particularly costly).
But I read in this answer that deletes leave gaps that don’t get cleaned up until/unless the index is rebuilt.
All of this suggests to me that I should:
- Put a clustered index on the timestamp column with a 100% fill-factor
- Put non-clustered indexes on any other column that may be used as a criterion in a query that doesn’t also involve the clustered column (which may be any of them in my case)
- Schedule the bulk deletes to occur during the daily maintenance interval
- Schedule a rebuild of the clustered index to occur immediately after the bulk delete
- Relax and get out more
Am I wildly off base there? Do I need to frequently rebuild the index like that to avoid lots of wasted space? Are there other obvious (to a DBA) things I should be doing?
Thanks in advance.
I agree with putting the clustered index on the timestamp column. My query would be on the fillfactor – 100% gives best read performance at the expense of write performance. you may be hurt by page splits. Choosing a lower fillfactor will delay page splitting at the expense of read performance so its a fine balancing act to get the best for your situation.
After the bulk deletes its worth rebuilding the indexes and updating statistics. This not only keeps performance up but also resets the indexes to the specified fillfactor.
Finally, yes put nonclustered indexes on other appropriate columns but only ones that are very select e.g not bit fields. But remember the more indexes, the more this affects write performance