I’m having trouble understanding the documentation for SciPy’s scipy.stats.hypergeom functions. In my program, I

Question

0

Asked: June 6, 20262026-06-06T19:59:05+00:00 2026-06-06T19:59:05+00:00

I’m having trouble understanding the documentation for SciPy’s scipy.stats.hypergeom functions. In my program, I

0

I’m having trouble understanding the documentation for SciPy’s scipy.stats.hypergeom functions. In my program, I consider various decks of cards and try to find the probability of various draws. The hypergeom class seems to have exactly this, but its documentation assumes a bunch of terminology knowledge that I don’t have. Googling leads me to Wikipedia and Wolfram MathWorld, both of which assume that if you’re asking about this kind of thing, you’ve read everything from the dang Principia Mathematica forward and just need a little refresher – so they’re not actually helpful. Because this problem is “how do I apply this specific chunk of code to my problem?” I’m asking Stack Overflow.

I have a problem of the form “if you have a deck of N cards, M of which are the card of interest, what are the odds of having at least 1 copy of the card of interest in the top Q cards?” I also have a problem of the form “if you have a deck of N cards, M of which are the card of interest, how many cards must you draw from the deck to have a 90% chance of one of them being a copy of the card of interest?” The former problem is very close to the example problem given in the SciPy documentation, but it’s not the same thing, and the list of methods is all jargon to me – I can’t actually tell which of them is the one that I need. I also can’t tell which method to use for the latter type of problem.

What do the methods of scipy.stats.hypergeom actually do, what are their arguments, and how can I apply them to my problems? Pretend I’m a moderately bright high-school student and not a mathematics PhD candidate.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T19:59:06+00:00

scipy.stats.hypergeom.pmf(k, M, n, N)

returns the probability that: from M cards, n of which are marked, if you randomly choose N cards without replacement, exactly k cards will be marked.

So you can get your desired answer (using your variable-names) by

def pick_Q(N, M, Q):
    """
    Given a deck of N cards, where M are marked,
    and Q cards are taken randomly without replacement,
    return the probability that at least one marked card is taken.
    """
    return sum(scipy.stats.hypergeom.pmf(k, N, M, Q) for k in xrange(1,Q+1))

(the sum of the odds that 1 card is marked, 2 cards are marked, 3 cards are marked… N cards are marked).

Luckily, there is a quicker way – the probability that at least one marked card is taken is the flip side of the probability that no marked card is picked. So instead you can do

def pick_Q(N, M, Q):
    """
    Given a deck of N cards, where M are marked,
    and Q cards are taken randomly without replacement,
    return the probability that at least one marked card is taken.
    """
    return 1. - scipy.stats.hypergeom.pmf(0, N, M, Q)

For your second question, there don’t appear to be any functions that do what you want; however, you can start with

def how_many_to_pick(N, M, prob):
    """
    Given a deck of N cards, M of which are marked,
    how many do you have to pick randomly without replacement
    to have at least prob probability of picking at least one marked card?
    """
    for q in xrange(1, M+1):
        if pick_Q(N, M, q) >= prob:
            return q
    raise ValueError("Could not find a value for q")

Edit:

scipy.stats.hypergeom.cdf(k, M, n, N)

Given a deck of M cards, n of which are marked, picking N randomly without replacement, find the odds that k or fewer marked cards are picked. (You can think of this as the integral of .pmf)

Then .sf(k, M, n, N) is the flip side of .cdf – the odds that more than k marked cards were picked.

For example,

 k      pmf(k,52,13,4)   cdf(k,52,13,4)   sf(k,52,13,4)
     (exactly k picked)  ( <= k picked)   ( > k picked)
---  -----------------  ---------------  --------------
 0       0.303817527      0.303817527      0.696182473
 1       0.438847539      0.742665066      0.257334934
 2       0.213493397      0.956158463      0.043841537
 3       0.041200480      0.997358944      0.002641056
 4       0.002641056      1.000000000      0.000000000

Edit2:

actually, this gives another way of writing the pick_Q function – ‘picking 1 or more marked cards’ can be rephrased as ‘picking more than 0 marked cards’, so

def pick_Q(N, M, Q):
    """
    Given a deck of N cards, where M are marked,
    and Q cards are taken randomly without replacement,
    return the probability that at least one marked card is taken.
    """
    return scipy.stats.hypergeom.sf(0, N, M, Q)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having trouble understanding the documentation for SciPy’s scipy.stats.hypergeom functions. In my program, I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply