I have two tables named table_1 (1GB) and reference (250Mb).
When I query a cross join on reference it takes 16hours to update table_1 .. We changed the system files EXT3 for XFS but still it’s taking 16hrs.. WHAT AM I DOING WRONG??
Here is the update/cross join query :
mysql> UPDATE table_1 CROSS JOIN reference ON
-> (table_1.start >= reference.txStart AND table_1.end <= reference.txEnd)
-> SET table_1.name = reference.name;
Query OK, 17311434 rows affected (16 hours 36 min 48.62 sec)
Rows matched: 17311434 Changed: 17311434 Warnings: 0
Here is a show create table of table_1 and reference:
CREATE TABLE `table_1` (
`strand` char(1) DEFAULT NULL,
`chr` varchar(10) DEFAULT NULL,
`start` int(11) DEFAULT NULL,
`end` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`name2` varchar(255) DEFAULT NULL,
KEY `annot` (`start`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
CREATE TABLE `reference` (
`bin` smallint(5) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`chrom` varchar(255) NOT NULL,
`strand` char(1) NOT NULL,
`txStart` int(10) unsigned NOT NULL,
`txEnd` int(10) unsigned NOT NULL,
`cdsStart` int(10) unsigned NOT NULL,
`cdsEnd` int(10) unsigned NOT NULL,
`exonCount` int(10) unsigned NOT NULL,
`exonStarts` longblob NOT NULL,
`exonEnds` longblob NOT NULL,
`score` int(11) DEFAULT NULL,
`name2` varchar(255) NOT NULL,
`cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`exonFrames` longblob NOT NULL,
KEY `chrom` (`chrom`,`bin`),
KEY `name` (`name`),
KEY `name2` (`name2`),
KEY `annot` (`txStart`,`txEnd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
I see 2 problems with the
UPDATEstatement.There is no index for the
Endfields. The compound indexes (annot) you have will be used only for thestartfields in this query. You should add them as suggested by Emre:Second, the
JOINmay (and probably does) find many rows of tablereferencethat are related to a row oftable_1. So some (or all) rows oftable_1that are updated, are updated many times. Check the result of this query, to see if it is the same as your updated rows count (17311434):There can be other ways to write this query but the lack of a
PRIMARY KEYon both tables makes it harder. If you define a primary key ontable_1, try this, replacingidwith the primary key.Update: No, do not try it on a table with 34M rows. Check the execution plan and try with smaller tables first.
You can check the query plan by running EXPLAIN on the equivalent SELECT: