I have a stored procedure that opens a CURSOR on a select statement that iterates over a table of 15M rows (This table is a simpl import of a large CSV).
I need to normalize that data by inserting various pieces of each row into 3 different tables (capture auto-update ID’s, use them in forein key constraints, and such).
So I wrote a simple stored procedure, open CURSOR, FETCH the fields into varialbes and do the 3 insert statements.
I’m on a small DB server, default mysql installation (1 cpu, 1.7GB ram), I had hoped for a few hours for this task. I’m at 24 hours+ and top shows 85% wasted CPU.
I think I have some kind of terrible inefficiency. Any ideas on improving the efficiency of the task? Or just determining where the bottleneck is?
root@devapp1:/mnt/david_tmp# vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 256 13992 36888 1466584 0 0 9 61 1 1 0 0 98 1
1 2 256 15216 35800 1466312 0 0 57 7282 416 847 2 1 12 85
0 1 256 14720 35984 1466768 0 0 42 6154 387 811 2 1 10 87
0 1 256 13736 36160 1467344 0 0 51 6979 439 934 2 1 9 89
DROP PROCEDURE IF EXISTS InsertItemData;
DELIMITER $$
CREATE PROCEDURE InsertItemData() BEGIN
DECLARE spd TEXT;
DECLARE lpd TEXT;
DECLARE pid INT;
DECLARE iurl TEXT;
DECLARE last_id INT UNSIGNED;
DECLARE done INT DEFAULT FALSE;
DECLARE raw CURSOR FOR select t.shortProductDescription, t.longProductDescription, t.productID, t.productImageURL
from frugg.temp_input t;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN raw;
read_loop: LOOP
FETCH raw INTO spd, lpd, pid, iurl;
IF done THEN
LEAVE read_loop;
END IF;
INSERT INTO item (short_description, long_description) VALUES (spd, lpd);
SET last_id = LAST_INSERT_ID();
INSERT INTO item_catalog_map (catalog_id, catalog_unique_item_id, item_id) VALUES (1, CAST(pid AS CHAR), last_id);
INSERT INTO item_images (item_id, original_url) VALUES (last_id, iurl);
END LOOP;
CLOSE raw;
END$$
DELIMITER ;
MySQL will almost always perform better executing straight SQL statements, than looping inside a stored procudure.
That said, if you are using InnoDB tables, your procedure will run faster inside a
START TRANSACTION/COMMITblock.Even better would be to add an
AUTO_INCREMENTto the records infrugg.temp_input, and querying against that table:Even better than the above, is before loading the .CSV file into
frugg.temp_input, you add anAUTO_INCREMENTfield to it, saving you the extra step of creating/loadingtemp_input2shown above.