In another thread, I saw the time complexity of a binary-heap weighted random sample is equal to O(n * log(m)) where n is the number of choices and m is the number of nodes to choose from.
I was wondering about the time complexity of an unweighted random sample which is used by Python as random.sample. Is the time complexity simply O(n) or is it something else entirely?
Python source:
random.py(line 267).Here’s the relevant bit:
It basically “rolls the dice” for a random index into
population. If it gets an index that’s already in the setselected, it re-rolls. Rinse, lather and repeatktimes (wherekis the number of samples you asked for.)It appears to be
O(n)in the size of the requested number of samples. There are some optimisations for small sets, but the meat of the thing is the main loop above.Edit:
I believe line 305-313 are a special case for when the number of samples requested,
k, is a large proportion of the total populationn. Instead of rolling for random elements from the entire population (and re-rolling if we collide with an element we already selected), we explicitly maintain a list of elements we have yet to select. We are guaranteed to get a new element every time, but the tradeoff is that we have to maintain the list.If I’m interpreting this wrong, feel free to comment below.