I have two mysql tables:
/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateRegistered` datetime NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
`UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
`Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`UserId`),
CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Both tables are populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;
WHILE (i<=cnt) DO
SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');
INSERT INTO users (Id, DateRegistered) VALUES(i, dt);
SET Winks = floor(1 + rand() * (30-1));
SET Views = floor(1 + rand() * (30-1));
INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);
The problem:
When I run the EXPLAIN for this query:
SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC
.. I get this explain:
Id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE t1 ALL PRIMARY (NULL) (NULL) (NULL) 10037 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test2.t2.UserId 1
The above query gets very slow when both tables have more than 500K rows. I guess it’s because of the ‘Using temporary; Using filesort’ in the explain of the query.
How can the above query be optimized so that it runs faster?
I’m faily sure that the ORDER BY is what’s killing you, since it cannot be properly indexed. Here is a workable, if not particularly pretty, solution.
First, let’s say you have a column named
Scorefor storing a user’s current score. Every time a user’sSent_ViewsorSent_Winkschanges, modify theScorecolumn to match. This could probably be done with a trigger (my experience with triggers is limited), or definitely done in the same code that updates theSent_ViewsandSent_Winksfields. This change wouldn’t need to know the DATEDIFF portion, because it could just divide by the old sum ofSent_Views + Sent_Winksand multiply by the new one.Now you just need to change the
Scorecolumn once per day (if you’re not picky about the precise number of hours a user has been registered). This could be done with a script run by a cron job.Then, just index the
Scorecolumn and SELECT away!Note: edited to remove incorrect first attempt.