I am currently working on a project which involves data manipulation of a MySQL

Question

0

Asked: June 10, 20262026-06-10T11:48:20+00:00 2026-06-10T11:48:20+00:00

I am currently working on a project which involves data manipulation of a MySQL

0

I am currently working on a project which involves data manipulation of a MySQL database. First of all, I need to tell you that I use a perl script that is executed on the same machine. Also, I would like to say some things about the table that I am working on: The create table is as follows:

CREATE TABLE `deCoupled` (
    `AA` double NOT NULL DEFAULT '0',
     ...several other fields,
     KEY `AA` (`AA`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

In order to optimize the way I work on the data, I create a temporary table like this:

CREATE TABLE `temp_deCoupled` AS SELECT * FROM `deCoupled` ORDER BY field1,field2,...,fieldN

and add an auto_increment key field that I need for the data manipulation:

ALTER TABLE `temp_deCoupled` ADD COLUMN MY_KEY INT NOT NULL AUTO_INCREMENT KEY
ALTER TABLE `temp_deCoupled` ADD INDEX (MY_KEY)

I alter the table like this, because I scan the table with the query

SELECT COUNT(`AA`), field1, field2,..., fieldN FROM `temp_deCoupled`
GROUP BY field1, field2,..., fieldN ORDER BY field1, field2,..., fieldN

and I execute updates on records according to the MY_KEY field.
Unfortunately, for a record number of about 75000 records, It takes about 75 minutes on a pc
with a dual core CPU and 2gigs of ram. Also, I need to tell you that the perl script that manipulates the data does not do any complex calculations.

I tried to tune the MYSQL server and I updated the my.cnf file with the following:

key_buffer = 256M
sort_buffer_size = 128M
read_buffer_size = 64M
read_rnd_buffer_size = 64M
key_buffer_size = 128M
table_cache = 1024
query_cache_limit = 128M
query_cache_size = 128M
innodb_buffer_pool_size = 768M
innodb_thread_concurrency = 8
innodb_flush_method = o_DIRECT

I really need to lower the execution time of the script. Can anyone make any suggestions?

To be more precise about the updates I will post a sample of the code below:

$qSel = "SELECT COUNT(*), field1,..., fieldN FROM `temp_deCoupled` GROUP BY field1,..., fieldN ORDER BY field1,...,fieldN";
$stmt = $dbh->prepare($qSel);
$stmt->execute() or die "Error occurred: $DBI::errstr.\n";
while($stmt->fetch()) {
    .... *some code*...
    $q_sel_keys = "SELECT MY_KEY FROM `temp_deCoupled` WHERE field1 = value1 AND ... AND fieldN = valueN";
    $stmt1 = $dbh->prepare($q_sel_keys);
    $stmt1->execute() or die "Error occured: $DBI::errstr.\n";
    ...*some other code*...
    $q_Update_Records = "UPDATE `temp_deCoupled` SET field1=val_1,..., fieldN=val_N WHERE MY_KEY = key1 OR MY_KEY = key2 OR ... OR MY_KEY = keyN";
    $stmt1 = $dbh->prepare($q_Update_Records);
    $tmp_c = $stmt1->execute() or die "Error occured: $DBI::errstr.\n";
    ...*some final code*...
}

and that is the main body (in general) of the data manipulation in Perl.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T11:48:21+00:00

It looks like you have provided a lot of information, but not the key information (if you will excuse the pun) needed. That is: what do the updates that take so long do?

If you are individually executing 75000 update statements, that is going to take a long time.
Try grouping them together where the operation performed by the update is the same and only the key differs, e.g. doing:

update temp_deCoupled set fieldx=..., fieldy=... where my_key in (?,?,?,?,...)

In a worst case scenario, where the updates are largely distinct, you can use another table to provide the information needed for the update. For instance, given this table:

create table foo ( id int primary key, bar double );

where you need to multiply each bar by a different value based on id, create another table to hold the multipliers, insert them in a single request from your script, and then update:

create temporary table foo_multiply ( id int primary key, mult double );
insert into foo_multiply values (1,123),(2,42),(3,666),...;
update foo inner join foo_multiply using (id) set foo.bar=foo.bar * foo_multiply.mult;

It can be a good idea to break up the insert statements into lines no longer than 1MB or so.
In extreme cases, write the data to insert out to a file and load it with “LOAD DATA INFILE”.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am currently working on a project which involves data manipulation of a MySQL

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply