I have just been asked to fix our address DB in work as it is very slow, i don’t normaly go near it as another guy looks after it but he has left now so its down to me.
problem is the DB contains 5 tables and a lot of information is replicated in each table there should be 27 million rows however there are 30 million rows so there are over 3 million rows repeated, and the way our old IT guy had it setup was when there was a query it would search all 5 tables an he used a php script to weed out the duplicate rows so information was only shown once. and this is slowing our server down considerably, so I wrote a php script to take each row an compare it against the 30 million others and if there was a duplicate to delete it, however 2 mins after i started it the server crashed so i tried a few other scripts using php however every time i try to run a complex mysql query the server crashes.
Is there an easy way that won’t crash the server to merge all the tables an delete all the duplicated entries?
Copy of the DB
post1 10,044,279 MyISAM latin1_german2_ci 758.1 MiB -
post2 8,328,333 MyISAM latin1_german2_ci 624.7 MiB -
postcode 9,344,317 MyISAM latin1_german2_ci 703.8 MiB -
postcode_nw 1,157,217 InnoDB utf8_unicode_ci 97.6 MiB -
postcode_tmp 1,749,650 MyISAM latin1_german2_ci 50.5 MiB -
A common problem with PHP developers is they forget there is such a thing as memory in computers.
It “smells” as if you tried to load everything into the memory.
Your approach was actually the right one, it will be very slow, but safe. If you implement it correctly.
You do not care about speed as it is a one time thing.