We’re doing an update query between two database tables and it is ridiculously slow. As in: it would take 30 days to perform the query.
One table, lab.list, contains about 940,000 records, the other, mind.list about 3,700,000 (3.7 million)
The update sets a field when two BETWEEN conditions are met. This is the query:
UPDATE lab.list L , mind.list M SET L.locId = M.locId WHERE L.longip BETWEEN M.startIpNum AND M.endIpNum AND L.date BETWEEN "20100301" AND "20100401" AND L.locId = 0
As it is now, the query is performing with about 1 update every 8 seconds.
We also tried it with the mind.list table in the same database, but that doesn’t matter for the query time.
UPDATE lab.list L, lab.mind M SET L.locId = M.locId WHERE longip BETWEEN M.startIpNum AND M.endIpNum AND date BETWEEN "20100301" AND "20100401" AND L.locId = 0;
Is there a way to speed up this query? Basically IMHO it should make two subsets of the databases:
mind.list.longip BETWEEN M.startIpNum AND M.endIpNum
lab.list.date BETWEEN “20100301” AND “20100401”
and then update the values for these subsets. Somewhere along the line I think I made a mistake, but where? Maybe there is a faster query possible?
We tried log_slow_queries, but that shows that it is indeed examining 100s of millions of rows, probably going up all the way to 3331 gigarows.
Tech info:
- Server version: 5.5.22-0ubuntu1-log (Ubuntu)
- lab.list has indexes on locId, longip, date
- lab.mind has indexes on locId, startIpNum AND M.endIpNum
- hardware: 2x xeon 3.4 GHz, 4GB RAM, 128 GB SSD (so that should not be a problem!)
In the end, the query was too big or cumbersome for mysql to fill. Even after indexing. Testing the same query with the same data on a high-end Sybase server, also took 3 hours.
So we abandoned the do it all on the database server thought, and went back to scripting languages.
We did the following in python:
All these updates together take about 5 minutes, so a huge improvement!
Conclusion: