which is more efficient (when managing over 100K records):
A. Mysql
SELECT * FROM user ORDER BY RAND();
of course, after that i would already have all the fields from that record.
B. PHP
use memcached to have $cache_array hold all the data from “SELECT id_user FROM user ORDER BY id_user” for 1 hour or so… and then:
$id = array_rand($cache_array);
of course, after that i have to make a MYSQL call with:
SELECT * FROM user WHERE id_user = $id;
so… which is more efficient? A or B?
The proper way to answer this kind of question is to do a benchmark. Do a quick and dirty implementation each way and then run benchmark tests to determine which one performs better.
Having said that,
ORDER BY RAND()is known to be slow because it’s impossible for MySQL to use an index. MySQL will basically run theRAND()function once for each row in the table and then sort the rows based on what came back fromRAND().Your other idea of storing all
user_ids in memcached and then selecting a random element form the array might perform better if the overhead of memcached proves to be less than the cost of a full table scan. If your dataset is large or staleness is a problem, you may run into issues though. Also you’re adding some complexity to your application. I would try to look for another way.I’ll give you a third option which might outperform both your suggestions: Select a
count(user_id)of the rows in your user table and then have php generate a random number between 0 and the result ofcount(user_id)minus 1, inclusive. Then do aSELECT * FROM user LIMIT 1 OFFSET random-number-generated-by-php;.Again, the proper way to answer these types of questions is to benchmark. Anything else is speculation.