I have a database where users are represented by hashes. Each user (hash) has associated values (pertaining to the music track that he was listening to). Since, a user can listen to more than one track there are repeat instances of the user and associated data (with data being different as it is now a different track).
What I would like to do is select ~10 users at random from this database, and then find their associated data.
Currently, the code that I am using is this:
SELECT *
FROM `tblPlayLogV4`
WHERE `titleId` <> 0 AND `hash` IN (SELECT `hash` FROM `tblPlayLogV4` WHERE `titleId` <> 0 AND RAND() <= 0.1 GROUP BY `hash` HAVING COUNT(`hash`) > 500);
Why RAND() – because LIMIT is not allowed in inner queries. Idea for RAND() here – http://www.rndblog.com/how-to-select-random-rows-in-mysql/
The above query takes ages to complete.
If however, I run the inner query separately, it finishes in 4.53s. I then hard code the result of the inner query in the outer query, and that finishes in about 275 ms. The separated queries are presented below:
SELECT `hash` FROM `tblPlayLogV4` WHERE `titleId` <> 0 AND RAND() <= 0.1 GROUP BY `hash` HAVING COUNT(`hash`) > 500);
SELECT * FROM `tblPlayLogV4` WHERE `hash` IN ('-29e291921cccd06a5813bca17b7f7c3','-2c08232108dcd93c443d821165c2c79','-58285c1602072da713e51cc6cdc6313','-5bcc2c42482d5805277609a84474aef','-79ecab520d661a1d624de7e7b04f265','-e937c753a96fc9e441f83af97b08489','04d3f1e91e4e92970819190346405d2d','3f9f0cd502de38d47e39367cdfdd6722') AND `titleId`<>0;
Can someone please explain to me why this is happening? What is it that I am doing wrong? And if there is a better way for me to formulate my query, do tell me.
Number of entries in the database: 6,322,605
As Dems said, your existing query is executing your RAND() selection in the sub-query for every record in your main query’s table.
So try rewriting your main query like this: