I’m having trouble understanding the documentation for SciPy’s scipy.stats.hypergeom functions. In my program, I consider various decks of cards and try to find the probability of various draws. The hypergeom class seems to have exactly this, but its documentation assumes a bunch of terminology knowledge that I don’t have. Googling leads me to Wikipedia and Wolfram MathWorld, both of which assume that if you’re asking about this kind of thing, you’ve read everything from the dang Principia Mathematica forward and just need a little refresher – so they’re not actually helpful. Because this problem is “how do I apply this specific chunk of code to my problem?” I’m asking Stack Overflow.
I have a problem of the form “if you have a deck of N cards, M of which are the card of interest, what are the odds of having at least 1 copy of the card of interest in the top Q cards?” I also have a problem of the form “if you have a deck of N cards, M of which are the card of interest, how many cards must you draw from the deck to have a 90% chance of one of them being a copy of the card of interest?” The former problem is very close to the example problem given in the SciPy documentation, but it’s not the same thing, and the list of methods is all jargon to me – I can’t actually tell which of them is the one that I need. I also can’t tell which method to use for the latter type of problem.
What do the methods of scipy.stats.hypergeom actually do, what are their arguments, and how can I apply them to my problems? Pretend I’m a moderately bright high-school student and not a mathematics PhD candidate.
returns the probability that: from M cards, n of which are marked, if you randomly choose N cards without replacement, exactly k cards will be marked.
So you can get your desired answer (using your variable-names) by
(the sum of the odds that 1 card is marked, 2 cards are marked, 3 cards are marked… N cards are marked).
Luckily, there is a quicker way – the probability that at least one marked card is taken is the flip side of the probability that no marked card is picked. So instead you can do
For your second question, there don’t appear to be any functions that do what you want; however, you can start with
Edit:
Given a deck of M cards, n of which are marked, picking N randomly without replacement, find the odds that k or fewer marked cards are picked. (You can think of this as the integral of .pmf)
Then .sf(k, M, n, N) is the flip side of .cdf – the odds that more than k marked cards were picked.
For example,
Edit2:
actually, this gives another way of writing the pick_Q function – ‘picking 1 or more marked cards’ can be rephrased as ‘picking more than 0 marked cards’, so