I have a query which aims to retrieve a random row from a result set. I do not want to use ORDER BY Rand() as it seems to be rather inefficient.
My method is as follows:
- generate a single random number between [0,1)
- give each row of the result query a unique ‘rank’ number. i.e. give the first row a value 1, second row a value 2, and so forth
- use the random number to get a number between 1 and the number of rows in the result
- return the row where rank == the number generated from the random number
example query:
SELECT * FROM(
(SELECT @rand := RAND(), @rank := 0) r1
CROSS JOIN
(SELECT (@rank:=@rank+1) as num, A.id FROM
A JOIN B
ON A.id = B.id
WHERE B.number = 42
)
WHERE num = FLOOR(1 + @rand * @rank) LIMIT 1
This works for retrieving one row, but I instead want 10 random rows. Changing LIMIT 1 to LIMIT 10 doesn’t work, because if num + 10 > number of rows the query doesn’t return 10 rows.
The only solution I can think of it to either generate 10 random numbers in the sql query, check they are all different from each other and have several WHERE num = random_number_1 lines. Alternatively, I could call the query 10 times, checking that the rows selected are unique. I wouldn’t know how to do the former, and the latter seems like it is rather inefficient. Unless there is likely to be some wonderful cache that would make running the same query extremely fast?
Does anyone have any ideas? thank you
You could try the following:
The results will be random unless the result set is smaller (or the same size as) the limit. If this is a problem, you can wrap the whole thing:
This will only randomize the small number of output rows (at most 5) which is efficient.
Of course, you can always use a temporary table:
If you want to guarantee that you get five unique rows, then you can use a second temporary table: