I’m trying to grab a random row from a table where the data doesn’t change. I’ve read that people try ORDER BY RAND() which is terrible for large datasets and doesn’t scale well.
I’ve also seen the solution that is to get SQL to get a random row between a minimum/maximum range like so: FLOOR(MAX(needed_id) * RAND) but this would only work when the rows are sequential: 1,2,3,4,5,6,7,8,9,10.
The data I need to pull out isn’t sequential, for example: 1,2,3,4,10,11,12,13
So I’m thinking there are two solutions:
1st Solution:
Keep running this: FLOOR(MAX(needed_id) * RAND) until I receive a row of the right type (1/6 chance)
2nd Solution:
Create a duplicate table (as my data never changes) like so:
temp_id | needed_id | type
1 1 1
2 4 1
3 7 2
3 8 2
So I can pull out a random temp_id using this method: FLOOR(MAX(temp_id) * RAND) – WHERE type = 1
What do you think? I’m likely to run the 1st solution about 6 times until I receive the correct row, but in the 2nd solution it would work straight away but requires another table.
Your statement
is not compeletely correct: The
floor()andmax()examples do work on non-sequential rows, because you would do somehting likeWHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1So you take the closest ID to the random hit you’re getting.
This does have a slight preference for hits that are directly after a big gap in your sequence, but that might not be too bad, depending on your dataset.
So depending on how much problem you’d have with this slight preference, how your dataset is, etc etc this could still be the best sollution.
Because it is unclear to some, the usage of the functions is not a problem:
MAXis quick on an indexed field. You don’t need to count all the rows (slow on innoDB), you just need to traverse your BTREE index, so you’ll find this value inlogtime. This is near-instantFLOORis just a mathematical function which will perform in linear time. Just asRAND. Mind you thatORDER BY rand()is not slow because ofrand, but because you need to order the complete table! this is not a problem of rand, but of order.Now you have a query that does something like:
Which is ofcourse very quick on an indexed field. Remember that you got that
48(an example) not by doing any sort of table scan.