I have a list of items. When I create the list each item has equal chance to be selected. But as an item is selected its chance goes down while the others chance goes up. If a new item is added during the process, it should have the highest chance to be selected with its chances falling off as it is selected. I am looking for a good algorithm that can accomplish this is C#.
Generalizaed idea:
I have 5 items, over time all 5 items will be selected 20% of time time. I am trying to keep the selections a close to that 20% as possible, cutting down on outlyers. If one exists it will be selected more/less to bring it back in line.
Here we will engineer a random number generator that has a distribution that favors low values. You can use it to prefer items at the beginning of a list. To decrease the odds of something being selected, move that item down the list. You have a few options for how you want to move the item down the list. Lets review the random variable transformation first.
By applying the following function to a uniform random variable between 0 and 1:
You get a cool distribution that drastically reduces the odds of a larger index
Here is the distribution for a list of size 2
Size 3
Size 4
Now lets discuss the two options for moving the items down the list. I tested two:
ToEnd – Move most recently selected item to end of list
Sort – Keep an associated array of the number of times each item has been selected and sort the list from least to most selected.
I created a simulation to pick from a list and examine the standard deviation of the count that each item was selected. The lower the standard deviation the better. For example, 1 simulation for a list of 10 items where 50 selections where made created the spread:
The Standard Devation for this simulation was
With the ability to run a simulation, I then computed some meta-statistics by running the simulation 500 times and providing the average standard deviation for each method: ToEnd and Sort. I expected for the standard deviation to be high for low # of picks, but in fact for the ToEnd algorithm the Standard Deviation increased with the number of picks. The sort method fixed this.
Here are some test results for a larger set.
With a good test framework, I couldn’t resist trying a different random number transformation. My assumption was if I take the cube root instead of square root of x that my standard deviation would decrease. It did in fact, but I was worried that this would decrease the randomness. Here you can observe a few simulations when the formula is changed to
Now inspect the actual picks. As I thought, its very weighted to the beginning of the list at first. If you want to weight this heavily, you should probably randomize your list before you start.