I am currently working on a project which involves data manipulation of a MySQL database. First of all, I need to tell you that I use a perl script that is executed on the same machine. Also, I would like to say some things about the table that I am working on: The create table is as follows:
CREATE TABLE `deCoupled` (
`AA` double NOT NULL DEFAULT '0',
...several other fields,
KEY `AA` (`AA`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
In order to optimize the way I work on the data, I create a temporary table like this:
CREATE TABLE `temp_deCoupled` AS SELECT * FROM `deCoupled` ORDER BY field1,field2,...,fieldN
and add an auto_increment key field that I need for the data manipulation:
ALTER TABLE `temp_deCoupled` ADD COLUMN MY_KEY INT NOT NULL AUTO_INCREMENT KEY
ALTER TABLE `temp_deCoupled` ADD INDEX (MY_KEY)
I alter the table like this, because I scan the table with the query
SELECT COUNT(`AA`), field1, field2,..., fieldN FROM `temp_deCoupled`
GROUP BY field1, field2,..., fieldN ORDER BY field1, field2,..., fieldN
and I execute updates on records according to the MY_KEY field.
Unfortunately, for a record number of about 75000 records, It takes about 75 minutes on a pc
with a dual core CPU and 2gigs of ram. Also, I need to tell you that the perl script that manipulates the data does not do any complex calculations.
I tried to tune the MYSQL server and I updated the my.cnf file with the following:
key_buffer = 256M
sort_buffer_size = 128M
read_buffer_size = 64M
read_rnd_buffer_size = 64M
key_buffer_size = 128M
table_cache = 1024
query_cache_limit = 128M
query_cache_size = 128M
innodb_buffer_pool_size = 768M
innodb_thread_concurrency = 8
innodb_flush_method = o_DIRECT
I really need to lower the execution time of the script. Can anyone make any suggestions?
To be more precise about the updates I will post a sample of the code below:
$qSel = "SELECT COUNT(*), field1,..., fieldN FROM `temp_deCoupled` GROUP BY field1,..., fieldN ORDER BY field1,...,fieldN";
$stmt = $dbh->prepare($qSel);
$stmt->execute() or die "Error occurred: $DBI::errstr.\n";
while($stmt->fetch()) {
.... *some code*...
$q_sel_keys = "SELECT MY_KEY FROM `temp_deCoupled` WHERE field1 = value1 AND ... AND fieldN = valueN";
$stmt1 = $dbh->prepare($q_sel_keys);
$stmt1->execute() or die "Error occured: $DBI::errstr.\n";
...*some other code*...
$q_Update_Records = "UPDATE `temp_deCoupled` SET field1=val_1,..., fieldN=val_N WHERE MY_KEY = key1 OR MY_KEY = key2 OR ... OR MY_KEY = keyN";
$stmt1 = $dbh->prepare($q_Update_Records);
$tmp_c = $stmt1->execute() or die "Error occured: $DBI::errstr.\n";
...*some final code*...
}
and that is the main body (in general) of the data manipulation in Perl.
It looks like you have provided a lot of information, but not the key information (if you will excuse the pun) needed. That is: what do the updates that take so long do?
If you are individually executing 75000 update statements, that is going to take a long time.
Try grouping them together where the operation performed by the update is the same and only the key differs, e.g. doing:
In a worst case scenario, where the updates are largely distinct, you can use another table to provide the information needed for the update. For instance, given this table:
where you need to multiply each bar by a different value based on id, create another table to hold the multipliers, insert them in a single request from your script, and then update:
It can be a good idea to break up the insert statements into lines no longer than 1MB or so.
In extreme cases, write the data to insert out to a file and load it with “LOAD DATA INFILE”.