I saw the solution to create an alternate temporary MySQL table with unique rows, but I didn’t like that idea, as my tables are very large and would be a hassle to move them (and would create huge problems if there would be errors during the move).
I did, however, find the following. What do you think of this (where the duplicates to check is “field_name”)?
DELETE FROM table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)
Somebody said this should work, but I’m not quite sure. What do you think? Also, will having indexes at all alter the performance of this command, say, having an index on “field_name”?
EDIT: Would there be any way to test the query before running it? As far as I know, MySQL doesn’t support “explain” on DELETE queries.
Note that the query you show will delete both duplicates. I would assume you want to keep one or the other.
Here’s how I would write this query:
By using greater-than instead of not-equals-to, you only delete one row (the later one), instead of both.
A compound index over (id, field_name) may help. You should confirm this with MySQL’s
EXPLAINto get the optimization report. ButEXPLAINonly supportsSELECTqueries so you should run an equivalentSELECTto confirm the optimization:You also asked about testing. I’d recommend copying a sample of rows containing duplicates to a table in your
testdatabase:Now you can perform experiments on your sample data until you’re satisfied the
DELETEsolution is correct.I’d recommend naming your scratch table in the
testdatabase something distinct from your real table in your real database. Just in case you run an experimentalDELETEwhile you are accidentally still using your real database as the default database!Re your comments:
USE testis a mysql client builtin command. It sets thetestdatabase as the default database. This will be the default database when you name tables in your queries without qualifying them with a database name. See http://dev.mysql.com/doc/refman/5.1/en/use.htmlSET autocommit = 0turns off the default behavior of committing a transaction for each query implicitly. So you must explicitly give theCOMMITorROLLBACKcommand to finish a transaction. See http://dev.mysql.com/doc/refman/5.1/en/commit.htmlIt’s worthwhile to use
ROLLBACKwhen you’re experimenting because it discards the changes made in that transaction. It’s a quick way to return to the initial state of your test data so you can try another experiment.DELETE t1is not a typo.DELETEdeletes rows, not whole tables.t1is an alias to each row that satisfies the conditions of the statement (although it is possible that the conditions include every row in the table). See description of multi-table delete at http://dev.mysql.com/doc/refman/5.1/en/delete.htmlSort of like when you run a loop in PHP and you use a variable to iterate over the loop:
for ($i=0; $i<100; ++$i)… The variable$itakes on a series of values, and each time through the loop it has a different value.Here’s a demo showing how my solution deletes multiple duplicates. I ran this in my
testdatabase and I’m pasting the result directly from my command window: