I have a model called HeroStatus with the following attributes:
- id
- user_id
- recordable_type
- hero_type (can be NULL!)
- recordable_id
- created_at
There are over 100 hero_statuses, and a user can have many hero_statuses, but can’t have the same hero_status more than once.
A user’s hero_status is uniquely identified by the combination of recordable_type + hero_type + recordable_id. What I’m trying to say essentially is that there can’t be a duplicate hero_status for a specific user.
Unfortunately, I didn’t have a validation in place to assure this, so I got some duplicate hero_statuses for users after I made some code changes. For example:
user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2010-05-03 18:30:30'
user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2009-03-03 15:30:00'
user_id = 18
recordable_type = 'Good'
hero_type = 'Hugs'
recordable_id = 1
created_at = '2009-02-03 12:30:00'
user_id = 18
recordable_type = 'Good'
hero_type = NULL
recordable_id = 2
created_at = '2009-012-03 08:30:00'
(Last two are not a dups obviously. First two are.) So what I want to do is get rid of the duplicate hero_status. Which one? The one with the most-recent date.
I have three questions:
-
How do I remove the duplicates using a SQL-only approach?
-
How do I remove the duplicates using a pure Ruby solution? Something similar to this: Removing "duplicate objects".
-
How do I put a validation in place to prevent duplicate entries in the future?
For an SQL only approach, I would use this query – (I’m assuming the id’s are unique.)
A bit of a monster! The query finds all duplicates using the natural key (user_id, recordable_type, hero_type) and selects the one with the largest
created_atvalue (most recently created). It then finds the IDs of those rows (by joining back to the main table) and deletes rows with that id.(Please try this on a copy of the table first and verify you get the results you want! 🙂
To prevent this happening in future, add a unique index or constraint over the columns user_id, recordable_type, hero_type, recordable_id. E.g.
EDIT:
You add (and remove) this index within a migration like this:
Or, if you want to explicitly name it: