Given a sequence of trillion real numbers on a disk…
How would you compute the running
MEDIAN of every thousand entries i.e.,the first point would be the median of
a[0],….,a[999],the second point would be the median of
a[1],…,a[1000],the third point would be the median of
a[2],…,a[1001], etc. ?
The naive solution is actually not so bad, keep a sorted list of 1000 numbers in memory, and every time you step over to the next index remove
a[i-1]from the sorted list and adda[i+999]to the sorted list.Once you have that it’s easy to compute the median in a sorted list.
The question is how do you do better?