I’ve got a table that records information about customer contacts. The table is defined as only “recent” contacts and I would like to delete all records for contacts older than 3 weeks.
For example, the table is:
create table recent_contact {
recent_contact_id int identity (1,1) primary key,
contact_text nvarchar(4000),
created datetime
}
create index createdIndex
on recent_contact (created)
All inserts to this table will happen via a stored procedure that just does an INSERT statement.
My question is about cleanup. I would like to delete all items older than 3 weeks. So far I have thought of 2 ways to accomplish cleanup.
-
have a background database job run periodically (e.g. every 5 hours) that will scan the above table and delete anything older than 3 weeks.
-
In the
insert()stored procedure call, add the logic to clear out old data. This should only add constant time overhead since the table is indexed on [created] and each item is inserted once and deleted only once. So on average this sproc will do 1 insert and 1 delete.
// insert
insert into recent_contacts (contact_text, created)
values (@text, @createDate)
declare @threeWeeksAgo datetime
set @threeWeeksAgo = DATEADD(DAY, -21, GETDATE())
// remove old items
delete from recent_contacts
where created < @threeWeeksAgo
Of the two options, I went with option 2) because I felt it was a more elegant solution and wouldn’t require a separate cleanup job. My coworker told me that this was bad practice and that retention policy should always be in a separate job that runs periodically. I.e. he thought option 1) was the better option.
I’m wondering what other people think? Generally speaking, what are the best practices for enforcing data retention policies?
Do 1). Option 2) is a misguided idea. There is no reason to shun the periodic job, but there are plenty of reasons to avoid punishing every single insert with the cost of looking up stale entries, and even more punishing for INSERTs to randomly hit a spike in response time because it was the unlucky winner of the lottery ticket to clean up some entries. A scheduled job on the other hand can be scheduled at convenient hours. And, last but not least, consider that your ‘clever’ design requires an INSERT in order for maintenance to occur.
In time you will learn that due to index tipping point issues the cleanup of data post retention period is actually a very tricky problem and many developer bodies pave that road. You will also discover that time series like a clustered index by the time column, not the least because of obsolete data cleanup issues.