I am experiencing great performance until a point where my query jumps from seconds to hours.
What can I do to A) investigate B) resolve a major performance bottleneck when querying too much data on Mysql?
Perhaps memory related?
Results
In testing the performance of a stored procedure I ran it twice within 5 minutes, first…
mysql> CALL TopFromBigTable('2012-04-01','2012-05-01',5);
5 rows in set (23.76 sec)
Which is extremely fast but then I call it again and … I killed it after well over an hour!
mysql> CALL TopFromBigTable('2012-04-01','2012-05-01',5);
---TRANSACTION 1484EF5C, ACTIVE 3571 sec fetching rows, thread declared inside InnoDB 193
mysql tables in use 2, locked 1
MySQL thread id 466174, OS thread handle 0x7f3616ab4700, query id 33098684 localhost root Copying to tmp table
More tests:
mysql> CALL TopFromBigTable('2012-05-01','2012-05-04',5);
5 rows in set (1.28 sec)
mysql> CALL TopFromBigTable('2012-05-01','2012-05-05',5);
5 rows in set (1.55 sec)
mysql> CALL TopFromBigTable('2012-05-01','2012-05-06',5);
5 rows in set (1 hour 47 min 37.99 sec)
The details
Table
CREATE TABLE `BigTable` (
`BigTableID` int(11) NOT NULL,
`AnotherID` int(11) NOT NULL,
`Type` char(2) COLLATE utf8_unicode_ci DEFAULT NULL,
`StartTime` datetime NOT NULL,
`EndTime` datetime DEFAULT NULL,
PRIMARY KEY (`BigTableID`),
KEY `Type` (`Type`),
KEY `StartTime` (`StartTime`),
KEY `EndTime` (`EndTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Query (note in making this generic I had the group by wrong)
CREATE PROCEDURE `TopFromBigTable` (
$StartDate DATETIME,
$EndDate DATETIME,
$ResultLimit INT
)
BEGIN
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT
`Type`,
COUNT(*) AS Count
FROM
`BigTable`
WHERE
`StartTime` > $StartDate
AND
`StartTime` < $EndDate
GROUP BY
`Type`
ORDER BY
Count DESC
LIMIT $ResultLimit
;
COMMIT;
END $$
Execution plan
EXPLAIN EXTENDED ...
id: 1
select_type: SIMPLE
table: BigTable
type: range
possible_keys: StartTime
key: StartTime
key_len: 8
ref: NULL
rows: 16446226
filtered: 100.00
Extra: Using where; Using temporary; Using filesort
I’m running on a dedicated reporting database and so not all the normal rules apply, i.e. I read uncommitted in attempt to lower overheads since accuracy is non-critical and this database wasn’t updated in the last 6 hours. I would like to benchmark how much this actually helps (if at all) but I cannot reliably time the stored procedure!
This is a tough one…If it’s really a big table – adding and index in the StartTime may take a time to do and a bit of extra space but will improve select speed provided specify the USE INDEX according to http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
be sure to re-index depending on how many rows are inserted on a daily bases…use your discretion for that.
While researching your interesting issue I came across this as well : http://www.petefreitag.com/item/613.cfm