I have this sql query, which selects duplicates it finds in the table by movie_name:
SQL:
SELECT movies.movie_name, movies.year FROM movies
INNER JOIN (SELECT movie_name FROM movies
GROUP BY movie_name HAVING count(movie_id) > 1) dup ON movies.movie_name = dup.movie_name
// want also to test for same year, not just movie_name i.e movies.year = dup.year
is this possible?
would seem to be a reasonable start…
By deleting one I assume you mean keeping one, don’t forget you could have more than one duplicate
Lets say we’ll keep the first one get rid of the rest, and the one with the earliest movie_id is the first
So
would give you all the ones to keep
Above is all those records which are in movies but not in the query of all those we want to keep.
So that gives us all the ones you want to get rid of. For Cthulhu’s sake don’t trust me or yourself on this! Take a back up before you do the delete!
So now we’ve took the query we proved (you did prove it didn’t you! ) and instead of selecting the offending records we are deleting them.
Don’t forget the back up!