In another thread, I saw the time complexity of a binary-heap weighted random sample

Question

0

Asked: June 3, 20262026-06-03T16:42:03+00:00 2026-06-03T16:42:03+00:00

In another thread, I saw the time complexity of a binary-heap weighted random sample

0

In another thread, I saw the time complexity of a binary-heap weighted random sample is equal to O(n * log(m)) where n is the number of choices and m is the number of nodes to choose from.

I was wondering about the time complexity of an unweighted random sample which is used by Python as random.sample. Is the time complexity simply O(n) or is it something else entirely?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T16:42:06+00:00

Python source: random.py (line 267).

Here’s the relevant bit:

   315             selected = set()
   316             selected_add = selected.add
   317             for i in range(k):
   318                 j = randbelow(n)
   319                 while j in selected:
   320                     j = randbelow(n)
   321                 selected_add(j)
   322                 result[i] = population[j]

It basically “rolls the dice” for a random index into population. If it gets an index that’s already in the set selected, it re-rolls. Rinse, lather and repeat k times (where k is the number of samples you asked for.)

It appears to be O(n) in the size of the requested number of samples. There are some optimisations for small sets, but the meat of the thing is the main loop above.

Edit:

I believe line 305-313 are a special case for when the number of samples requested, k, is a large proportion of the total population n. Instead of rolling for random elements from the entire population (and re-rolling if we collide with an element we already selected), we explicitly maintain a list of elements we have yet to select. We are guaranteed to get a new element every time, but the tradeoff is that we have to maintain the list.

If I’m interpreting this wrong, feel free to comment below.

   303         result = [None] * k
   304         setsize = 21        # size of a small set minus size of an empty list
   305         if k > 5:
   306             setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
   307         if n <= setsize:
   308             # An n-length list is smaller than a k-length set
   309             pool = list(population)
   310             for i in range(k):         # invariant:  non-selected at [0,n-i)
   311                 j = randbelow(n-i)
   312                 result[i] = pool[j]
   313                 pool[j] = pool[n-i-1]   # move non-selected item into vacancy
   314         else:
   315             selected = set()
   316             selected_add = selected.add
   317             for i in range(k):
   318                 j = randbelow(n)
   319                 while j in selected:
   320                     j = randbelow(n)
   321                 selected_add(j)
   322                 result[i] = population[j]
   323         return result

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In another thread, I saw the time complexity of a binary-heap weighted random sample

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply