I’m running what I thought was a fairly straight forward update on a fairly large table. I am trying to find out why this simple update is running so slowly. It took about 5 hours to complete.
master table: approx 2m row and 90 fields.
builder table: approx 1.5m rows and 15 fields
I had initially attempted the insert directly:
-- Update master table with newly calculated mcap
update master as m
inner join
(select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base) as x
on x.gvkey = m.gvkey AND
x.date_base = m.date_base
set m.mcap = x.sum_sec_mkt;
Unfortunately this ran for a number of hours and I finally killed it after waiting 4hrs.
I then thought I’d create a temporary table and insert the results from the initial select into it.
CREATE TABLE `temp_mkt_cap` (
`date_base` date NOT NULL,
`gvkey` varchar(15) DEFAULT NULL,
`mkt_cap` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
-- insert market cap values in to temporary table
insert into temp_mkt_cap
select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base;
ALTER TABLE temp_mkt_cap
add primary key (date_base, gvkey);
The insert worked fine with temp_mkt_cap having about 1.4m rows, but the final update took 5hrs to complete.
-- Update master table with newly calculated mcap
update master as m
inner join temp_mkt_cap as mc
on m.date_base = mc.date_base AND m.gvkey = mc.gvkey
set m.mcap = mc.mkt_cap;
‘master’ has ‘date_base’ and gvkey_iid as PRIMARY KEYS and gvkey as a KEY.
I have completed more complicated inserts and updates on the table before and can’t work out why this isn’t working.
Any help would be greatly appreciated.
Thanks,
Update: The keys on the master table are:
ALTER TABLE master
ADD PRIMARY KEY (gvkey_iid,date_base),
ADD KEY date_offset (date_offset),
ADD KEY gvkey (gvkey),
ADD KEY iid (iid);
Update I added a new key to the master table and the update ran in 93.6secs, down from 5 hours. Thanks for everyone’s help.
ALTER TABLE master
ADD KEY 'date-gvkey' (date_base, gvkey);
Since you are joining on
mc.date_base AND m.gvkey = mc.gvkey, you need an index on these fields in the same order you are joining them, on both tables.If you are joining table1 with table2
on table1.field1 = table2.field1 AND table1.field2 = table2.field2, you need an index on(table1.field1, table1.field2)AND(table2.field1, table2.field2).Not nullfields are preferable.Also, because you are updating from the
mc.mkt_capfield, you need a SINGLE key on this field if it is NOT already the first field of a composite key you created earlier.ALL other keys or indexes are going to possibly slow down your query.
Please inspect carefully your database…