So I recently had this as an interview question and I was wondering what

Question

0

Asked: June 17, 20262026-06-17T22:42:51+00:00 2026-06-17T22:42:51+00:00

So I recently had this as an interview question and I was wondering what

0

So I recently had this as an interview question and I was wondering what the optimal solution would be. Code is in Objective-c.

Say we have a very large data set, and we want to get a random sample
of items from it for testing a new tool. Rather than worry about the
specifics of accessing things, let’s assume the system provides these
things:

// Return a random number from the set 0, 1, 2, ..., n-2, n-1.
int Rand(int n);

// Interface to implementations other people write.
@interface Dataset : NSObject

// YES when there is no more data.
- (BOOL)endOfData;

// Get the next element and move forward.
- (NSString*)getNext;

@end


// This function reads elements from |input| until the end, and
// returns an array of |k| randomly-selected elements.
- (NSArray*)getSamples:(unsigned)k from:(Dataset*)input
{
  // Describe how this works.
}

Edit: So you are supposed to randomly select items from a given array. So if k = 5, then I would want to randomly select 5 elements from the dataset and return an array of those items. Each element in the dataset has to have an equal chance of getting selected.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T22:42:53+00:00

This seems like a good time to use Reservoir Sampling. The following is an Objective-C adaptation for this use case:

NSMutableArray* result = [[NSMutableArray alloc] initWithCapacity:k];

int i,j;

for (i = 0; i < k; i++) {
    [result setObject:[input getNext] atIndexedSubscript:i];
}

for (i = k; ![input endOfData]; i++) {
    j = Rand(i);

    NSString* next = [input getNext];

    if (j < k) {
        [result setObject:next atIndexedSubscript:j];
    }
}

return result;

The code above is not the most efficient reservoir sampling algorithm because it generates a random number for every entry of the reservoir past the entry at index k. Slightly more complex algorithms exist under the general category “reservoir sampling”. This is an interesting read on an algorithm named “Algorithm Z”. I would be curious if people find newer literature on reservoir sampling, too, because this article was published in 1985.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So I recently had this as an interview question and I was wondering what

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply