I have a list of 100,000 objects. Every list element has a “weight” associated with it that is a positive int from 1 to N.
What is the most efficient way to select a random element from the list? I want the behavior that my distribution of randomly chosen elements is the same as the distribution of weights in the list.
For example, if I have a list L = {1,1,2,5}, I want the 4th element to be selected 5/9ths of the time, on average.
Assume inserts and deletes are common on this list, so any approach using “integral area tables” would need to be updated often – hoping there is a solution with O(1) runtime and O(1) extra memory required.
You can use an augmented binary search tree to store the elements, along with the sum of the weights in each subtree. This lets you insert and delete elements and weights however you want. Both sampling and updates require O(lg n) time per operation, and space usage is O(n).
Sampling is accomplished by generating a random integer in [1, S], where S is the sum of all weights (S is stored at the root of the tree), and performing binary search using the weight-sums stored for each subtree.