I have 15M rows in 3 tables (one table is the original CSV import, the other two are normalized versions of that CSV + some other data).
I need to simply update one field from the original CSV table. The update query joining these tables has now run for 30 hours on my quad-core-8GB-ssd box.
- Is this normal? Is there a better way to perform this simple update?
Tables: ti (the CSV dump, denormalized, ~13M rows)
i (the primary, normalized table, ~17M rows)
icm (a map of ti.raw_id to i.item_id, ~17M rows)
mysql> explain select * from item AS i, item_catalog_map AS icm, temp_input AS ti WHERE i.id=icm.item_id AND icm.catalog_unique_item_id=ti.productID;
+----+-------------+-------+--------+----------------------+----------------------+---------+------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------+----------------------+---------+------------+----------+-------------+
| 1 | SIMPLE | i | ALL | PRIMARY | NULL | NULL | NULL | 13652592 | |
| 1 | SIMPLE | icm | ref | IDX_ACD6184F126F525E | IDX_ACD6184F126F525E | 5 | frugg.i.id | 1 | Using where |
| 1 | SIMPLE | ti | eq_ref | PRIMARY | PRIMARY | 767 | func | 1 | Using where |
+----+-------------+-------+--------+----------------------+----------------------+---------+------------+----------+-------------+
3 rows in set (0.06 sec)
mysql> UPDATE item AS i, item_catalog_map AS icm, temp_input AS ti
-> SET i.name=ti.productName,
-> icm.price=ti.retailPrice,
-> icm.conversion_url=productURL
-> WHERE i.id=icm.item_id AND icm.catalog_unique_item_id=ti.productID;
First of all, if your denormalized data has 13M records, but both of your “normalized” tables have 17M records, then you are not getting much compression out of your normalization.
Second, you are trying to update both normalized tables in one SQL statement. I would think that you should update the mapping table first, then in a second SQL statement update the data table.
Third, doing an inner join could speed things up because your query is doing a three-way cartesian product. Well, not exactly, because you are just doing the join old school, and the optimizer should pick it up, but none-the-less, use the JOIN syntax.
Finally, indexes to make sure you have are: