I have to update 2M*2rows in a mysql database.
All the information is in a file, that I process with php.
I get the information in a array, and then push it in the database using
UPDATE processed
SET number1=$row[1], number2=$row[2], timestamp=unix_timestamp()
where match (id) against ('\"$id\"' IN BOOLEAN MODE) limit 1
That’s working – but it takes soo long…
I have an index (primary) on (id).
I have tried to use something else than (id) that’s on a fulltext index (i’m using Myisam) – it’s even slower.
As my database is pretty big, and mysql has to go through everything to find the right line to update, it takes a few seconds per update.. which means a few days to process my update!
Is there any faster way to do that?
If I switch to innodb will that be faster? (Even if it’s not I guess it can be cool at during the update, my whole table won’t be locked).
As number1 & number2 are numbers, I though about grouping all the (id) that have to be updated to the same number – would that be faster?
Is there a way to tune mysqld so that number1, number2 & id colums would stay in RAM, making it faster to access / update?
Any idea is welcome, as I’m totally lost… 🙂
edit: adding an example code so that you can understand my situation:
foreach ($data_rows as $rows) {
$row=explode(":", $rows); // $row[0] info
// $row[1] new number1
// $row[2] new number2
$query = $db->query("select * from processed where match (info) against ('\"$info\"' IN BOOLEAN MODE) limit 1");
while ($line = $query->fetch_object())
{
$data[$line->hash]['number1']=$line->number1;
$data[$line->hash]['number2']=$line->number2;
$id=$line->id;
}
if (is_array($data[$info])) { // Check if we have this one in the database.
// If the number is correct, no need to update.
if (($data[$info]['number1'] != $row[1]) && ($data[$info]['number2'] != $row[2])) {
$db->query("UPDATE processed SET number1=$row[1], number2=$row[2], timestamp=unix_timestamp() where id=$id");
print "updated - $info - $row[1] - $row[2]\n";
}
}
else {
print "$info not in database\n";
}
}
shema:
CREATE TABLE `processed` (
`id` int(30) NOT NULL AUTO_INCREMENT,
`timestamp` int(14) DEFAULT NULL,
`name` text,
`category` int(2) DEFAULT '0',
`subcat` int(2) DEFAULT '0',
`number1` int(20) NOT NULL,
`number2` int(20) NOT NULL,
`comment` text,
`hash` text,
`url` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `name` (`name`),
FULLTEXT KEY `hash` (`hash`)
) ENGINE=MyISAM AUTO_INCREMENT=1328365 DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;
edit again:
ANALYZE TABLE processed; did help a lot in improving the time of my UPDATEs. (fresh indexes!)
Will add my data in another table & join update anyway 🙂
You are performing 2M*2
UPDATEcommands. That does take a while…I would advise you to dump the file contents into a temp table and then running a single
UPDATEcommand.Update
Here is how you’d run a single joined
UPDATE: