Here’s a question for interview crackers-
Given that you are receiving samples from an instrument at a constant rate, and you have constant storage space, how would you design a storage algorithm that would allow me to get a representative readout of data, no matter when I looked at it? In other words, representative of the behavior of the system to date.
I couldn’t get any idea of it. So, I am looking for ideas.
Assume that you have the memory to store
kelements. Store the firstkelements in the memory in an array. Now when you receive the nth element (wheren > k), generate a random numberrbetween1andn. Ifr > kdiscard thenth element. Otherwise replace therth element in the array with thenth element.This approach will ensure that at any stage your array would contain
kelements that are uniformly randomly selected from the input elements received so far.Proof We can show by induction that the
krepresentative elements at any stage are distributed in a uniformly random way. Assume that after receivingn-1elements, any element is present in the array with probabilityk/(n-1).After receiving the nth element, the probability that the element will be inserted into the array =
k/n.For any other element, the probability that it is represented in the current iteration = probability that it is represented in the previous iteration * probability that it is not replaced in the current iteration