What would be the most efficient way to select a non previously selected pair of different random values from a column of unique (non repeated) values?
My current approach is (keeping every pair of values already associated in a separate “mytable_associations” table):
SELECT * FROM
(
SELECT id,count(*) AS associations_count FROM mytable
INNER JOIN mytable_associations
WHERE (myvalue=myvalue1 OR myvalue=myvalue2)
GROUP BY myvalue
HAVING associations_count<(SELECT count(*) FROM mytable)-1
ORDER BY rand() limit 1
) mytable1
LEFT JOIN
(SELECT myvalue AS myvalue2 FROM mytable) mytable2
ON mytable1.myvalue1<>mytable2.myvalue2
WHERE
(
SELECT myvalue1 FROM mytable_associations
WHERE
myvalue1=mytable1.myvalue1 AND myvalue2=mytable2.myvalue2
OR
myvalue1=mytable2.myvalue2 AND myvalue2=mytable1.myvalue1
) IS NULL;
(And then of course update mytable_associations with this new association)
Which, as you can see, could hugely benefit from some optimization.
(Sorry about the poor indentation in the code, I really don’t know how to indent mysql commands).
Can you help me?
(P.S. This is my first question ever posted here: Sure I’m doing lots of things wrong and I’d understand the consequent flaming, but please don’t be too hard on me 😉 )
Any solution involving
order by rand()is going to be inefficient. For alternatives, see:To exclude numbers you’ve already picked, here’s how I’d do it (this is pseudocode):
By ensuring that myvalue1 < myvalue2, and storing them in that order in mytable_associations, you can simplify the join.