I have a db table filled with about ~30k records.
I want to randomly select a record one at a time (when demanded by users), delete the record from the table, and insert it in another table.
I’ve heard/found that doing ORDER BY RAND() can be quite slow. So I’m using this algorithm (pseudo code):
lowest = getLowestId(); //get lowest primary key id from table
highest = getHighestId(); //get highest primary key id from table
do
{
id = rand(lowest, highest); //get random number between a range of lowest id and highest id
idExists = checkIfRandomIdExists( id );
}
while (! idExists);
row = getRow (id);
process(row);
delete(id);
Right now, with 30k records, I seem to get random ids very quickly. However as the table size decreases to 15k, 10k, 5k, 100, etc, (can be months) I’m concerned that this might begin to get slower.
Can I do anything to make this method more effective, or is there a row count at which point I should start doing ORDER BY RAND() instead of this method? (e.g when 5k rows are left, start doing ORDER BY RAND() ?)
I might recommend a Fisher-Yates Shuffle instead in a separate table. To generate this, create a table like:
Notably, don’t bother with the foreign key constraint. In SQL Server, for instance, I would say to add a foreign key constraint with
ON DELETE CASCADE; if you have a storage engine for which that would be workable in MySQL, go for it.Now, in the language of your choice:
Shuffletable in order.Now, you have a random order, so you can just
INNER JOINto theShuffletable, thenORDER BY Shuffle.SequentialIdto find the first record. You can delete the record fromShufflemanually if you have no way to doON DELETE CASCADE.