I have a table in SQL Server, where i want inserts to be added to the end of the table (as opposed to a clustering key that would cause them to be inserted in the middle). This means I want the table clustered by some column that will constantly increase.
This could be achieved by clustering on a datetime column:
CREATE TABLE Things (
...
CreatedDate datetime DEFAULT getdate(),
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (CreatedDate)
)
But I can’t guaranteed that two Things won’t have the same time. So my requirements can’t really be achieved by a datetime column.
I could add a dummy identity int column, and cluster on that:
CREATE TABLE Things (
...
RowID int IDENTITY(1,1),
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (RowID)
)
But you’ll notice that my table already constains a timestamp column; a column which is guaranteed to be a monotonically increasing. This is exactly the characteristic I want for a candidate cluster key.
So I cluster the table on the rowversion (aka timestamp) column:
CREATE TABLE Things (
...
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (timestamp)
)
Rather than adding a dummy identity int column (RowID) to ensure an order, I use what I already have.
What I’m looking for are thoughts of why this is a bad idea; and what other ideas are better.
Note: Community wiki, since the answers are subjective.
That might sound like a good idea at first – but it’s really almost the worst option you have. Why?
The main requirements for a clustered key are (see Kim Tripp’s blog post for more excellent details):
Your
rowversionviolates the stable requirement, and that’s probably the most important one. The rowversion of a row changes with each modification to the row – and since your clustering key is being added to each and every non-clustered index in the table, your server will be constantly updating loads of non-clustered indices and wasting a lot of time doing so.In the end, adding a dummy identity column probably is a much better alternative for your case. The second best choice would be the
datetimecolumn – but here, you do run the risk of SQL Server having to add “uniqueifiers” to your entries when duplicates occur – and with a 3.33ms accuracy, this could definitely be happening – not optimal, but definitely much better than therowversionidea…