I am trying to obtain a random row from the result of an SQL query.
My query is as follows
SET @rank = 0;
SELECT * FROM(
SELECT (@rank:=@rank+1) AS num, ...
FROM ...
WHERE ....) as raw
WHERE raw.num = FLOOR(1 + (RAND() * @rank)) LIMIT 1
The general idea being that each row of the table from the inner query result is given a unique number (num). I have manually checked that this is indeed the case and that every row is numbered.
The last line is causing me distress. As it stands, WHERE num = FLOOR(1 + (RAND() * @rank)) LIMIT 1 is returning what I want – only half the time. It appears to be returning random rows within the correct range (which for the example I am testing the query on is 0-1299). However, one in every three queries returns absolutely nothing.
Ok, so I thought maybe it was a double precision issue, so I tried using >= as follows: WHERE num >= FLOOR(1 + (RAND() * @rank)) LIMIT 1. The result in this case is confusing me. With this code I always get a result, but the number of the row returned is always < 100.
So if we call FLOOR(1 + (RAND() * @rank)) x. When I use = rather than >= it confirms that x must (in some cases) be equal to numbers greater than 1000. However, when using >=, the fact that the condition is satisfied means that x must always be less than 100?
What’s going on? or how else can I solve my problem
I think the problem is that the
RAND()function in your query is being called multiple times, once for each row returned fromraw. If that’s what’s happening, then it’s possible that it won’t find any rows that satisfy the predicate, since it is comparing each row to a different target. (Is the first row the fifth row? Is the second row the third row? etc.)I would move the call to RAND() and the initial assignment of @rank to the beginning of the query, something like this:
— or, in keeping with your pattern of using a separate SET statement —
(I happen to prefer the former, as it runs as a single statement; it’s not dependent on user variables being set outside of the SELECT statement.)
But either of those should make sure that the call to the
RAND()function is happening exactly once (at the beginning of your query).Other than that, I don’t have a good explanation as to the behavior you are seeing.