Consider a problem where a random sublist of k items, Y, must be selected

Question

0

Asked: June 1, 20262026-06-01T23:47:08+00:00 2026-06-01T23:47:08+00:00

Consider a problem where a random sublist of k items, Y, must be selected

0

Consider a problem where a random sublist of k items, Y, must be selected from X, a list of n items, where the items in Y must appear in the same order as they do in X. The selected items in Y need not be distinct. One solution is this:

for i = 1 to k
    A[i] = floor(rand * n) + 1
    Y[i] = X[A[i]]
sort Y according to the ordering of A

However, this has running time O(k log k) due to the sort operation. To remove this it’s tempting to

high_index = n
for i = 1 to k
    index = floor(rand * high_index) + 1
    Y[k - i + 1] = X[index]
    high_index = index

But this gives a clear bias to the returned list due to the uniform index selection. It feels like a O(k) solution is attainable if the indices in the second solution were distributed non-uniformly. Does anyone know if this is the case, and if so what properties the distribution the marginal indices are drawn from has?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T23:47:09+00:00

For the first index in Y, the distribution of indices in X is given by:

P(x; n, k) = binomial(n – x + k – 2, k – 1) / norm

where binomial denotes calculation of the binomial coefficient, and norm is a normalisation factor, equal to the total number of possible sublist configurations.

norm = binomial(n + k – 1, k)

So for k = 5 and n = 10 we have:

norm = 2002
P(x = 0) = 0.357, P(x <= 0) = 0.357
P(x = 1) = 0.245, P(x <= 1) = 0.604
P(x = 2) = 0.165, P(x <= 2) = 0.769
P(x = 3) = 0.105, P(x <= 3) = 0.874
P(x = 4) = 0.063, P(x <= 4) = 0.937
… (we can continue this up to x = 10)

We can sample the X index of the first item in Y from this distribution (call it x1). The distribution of the second index in Y can then be sampled in the same way with P(x; (n – x1), (k – 1)), and so on for all subsequent indices.

My feeling now is that the problem is not solvable in O(k), because in general we are unable to sample from the distribution described in constant time. If k = 2 then we can solve in constant time using the quadratic formula (because the probability function simplifies to 0.5(x^2 + x)) but I can’t see a way to extend this to all k (my maths isn’t great though).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Consider a problem where a random sublist of k items, Y, must be selected

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply