I have some reporting activities that periodically deliver new data, my current strategy is to delete the old data and then insert the new, I use a range queries to move the reporting data in batches over a period of time.
My insert performance should be excellent, since all I’m doing here is appending to an ever increasing number, I’m using a datetime2(7) data type and sysdatetime() as default value.
However, I’m worried about fragmentation problems.
Old data will be the first to be written but eventually that data will get deleted and new data (that replaces this data) will get appended to the end.
My data should effectively roll into the future as it is updated.
I fully except all old data to eventually get deleted.
Do I still have to worry about fragmentation or would this do? I suspect this will have great performance but I’m still somewhat worried that SQL Server won’t be able to reclaim the deleted space.
I understand that you will insert and delete in clustered index order. This design is very reasonable. You might still get fragmentation on inserts after a while because the inserts will reuse deleted pages. There might well be anomalies like single pages not being freed or other miscellaneous pages being present in the range used for inserting. In that sense, fragmentation causes more fragmentation as a stochastic process.
The only way to guarantee no fragmentation is to partition your data coarsely and put each partition in a new file group. This ensures that inserts are always at the end of the file (no where else to put them). Also, deletes will eventually cause an entire partition to become eligible for deletion.
Do you have nonclustered indexes? They might get fragmented, too.