I have a script to find duplicate rows in my MySql table, the table contains 40,000,000 rows. but it is very slow going, is there an easier way to find the duplicate records without going in and out of php?
This is the script i currently use
$find = mysql_query("SELECT * FROM pst_nw ID < '1000'");
while ($row = mysql_fetch_assoc($find))
{
$find_1 = mysql_query("SELECT * FROM pst_nw add1 = '$row[add1]' AND add2 = '$row[add2]' AND add3 = '$row[add3]' AND add4 = '$row[add4]'");
if (mysql_num_rows($find_1) > 0) {
mysql_query("DELETE FROM pst_nw WHERE ID ='$row[ID]'}
}
You have a number of options.
Let the DB do the work
Create a copy of your table with a unique index – and then insert the data into it from your source table:
The advantage of doing things this way is you can verify that your new table is correct before dropping your source table. The disadvantage is it takes up twice as much space and is (relatively) slow to execute.
Let the DB do the work #2
You can also achieve the result you want by doing:
The first command is required as a workaround for the ignore flag being .. ignored
The advantage here is there’s no messing about with a temporary table – the disadvantage is you don’t get to check that your update does exactly what you expect before you run it.
Example:
Don’t do this kind of thing outside the DB
Especially with 40 million rows doing something like this outside the db is likely to take a huge amount of time, and may not complete at all. Any solution that stays in the db will be faster, and more robust.