I stumbled upon a basic discrete math/probability question and I wanted to get some ideas for improvements over my solution.
Assume you are given a collection (an alphabet, the natural numbers, etc.). How do you ensure that you draw a certain value X from this collection with a given probability P?
I’ll explain my naïve solution with an example:
Collection = {A, B}
X = A, P = 1/4
We build an array v = [A, B, B, B] and we use a rand function to uniformly sample the indices of the array, i.e., {0, 1, 2, 3}
This approach works, but isn’t efficient: the smaller P, the bigger the memory storage of v. Hence, I was wondering what ideas the stackoverflow community might have in improving this.
Thanks!
Partition the interval [0,1] into disjoint intervals whose union is [0,1]. Create the size of each partition to correspond to the probability of selecting each event. Then simply sample randomly from [0,1], evaluate which of your partitions the result lies in, then look up the selection that corresponds to that interval. In your example, this would result in the following 2 intervals [0,1/4) and [1/4,1] – generate a random uniform value from [0,1]. If your sample lies in the first interval then your selection
X = A, if in the other interval thenX = B.