I have a stored procedure which deals with adding nodes in a tree. Basically the table structure is
id INT PRIMARY
label VARCHAR(1) /* the value of the node which is a character */
parent_id INT /* id of the parent node */
Here is my stored procedure:
/*
takes a word, and adds every character in to the table
where every character is a child of the previous character
the first character of every word is a child of the root
*/
CREATE PROCEDURE rule(IN word VARCHAR(255))
BEGIN
/* (parent_id = 0) => child of root */
DECLARE pid INT DEFAULT 0; /* parent id */
DECLARE npid INT DEFAULT 0;
DECLARE strlength INT;
DECLARE j INT DEFAULT 1;
DECLARE query_count INT DEFAULT 0;
DECLARE active_char VARCHAR(1);
SET strlength = LENGTH(word);
/* loop through the word */
WHILE j <= strlength DO
/* get a single character from word */
SET active_char = SUBSTR(word,j,1);
/* if the character doesn't already exist, insert it */
SELECT COUNT(*) INTO query_count FROM tree
WHERE parent_id = pid AND label = active_char;
IF (query_count = 0) THEN
INSERT INTO tree (label, parent_id)
VALUES (active_char,pid);
END IF;
/* Set the new parent id */
SELECT id INTO npid FROM tree
WHERE label = active_char AND parent_id = pid;
SET pid = npid;
SET j = j + 1;
END WHILE;
END //
I’m sure there are a few tweaks I can make to make the procedure a bit more efficient but I can’t think of anything that would significantly reduce the time needed.
I’m dealing with a lot of words which means this procedure is run a couple 100,000 times which in turn means a lot of inserts and a lot of queries. It takes hours perhaps days(not sure because I gave up waiting and stopped the process).
The thing is, I don’t think I can do a bulk insert because every insert depends on a previous insert.
I was wondering if there’s some way to create a virtual table that’s stored in main memory to do all these operations fast and then just save the result in to the actual table.
At the moment, the only possible solution I can think of is to build the tree in PHP and then do a bulk insert. I think this should be faster but I’m not sure up to what degree.
Any help would be really appreciated.
Thanks.
Whilst I’m afraid I can’t claim to have used it myself, passing the data down to the stored procedure as XML and processing it as described here would seem a reasonable approach. N.B. MySQL 5.1 or higher.