Given a sequence of data (it may have duplicates), a fixed-sized moving
window, move the window at each iteration from the start of the data
sequence, such that
(1) the oldest data element is removed from the window and a new data
element is pushed into the window
(2) find the median of the data inside the window at each moving.
The following posts are not helpful.
Effectively to find the median value of a random sequence
joining data based on a moving time window in R
My idea:
Use 2 heaps to hold median. In side the window, sort the data in the window
in the first iteration, the min heap holds the larger part and the max heap
holds the smaller part. If the window has odd number of data, the max heap
returns the median otherwise the arithmetic mean of the top elements of the
two heaps is the median.
When a new data is pushed in to the window, remove the oldest data from one
of the heap and compare the new data with the top of max and min heap so
that to decide which heap the data to be put. Then, find the median just
like in the first iteration.
But, how to find a data element in a heap is a problem. Heap is a binary
tree not a binary search tree.
Is it possible to solve it with O(n) or O(n * lg m) where m is the window size and
space: O(1) ?
Any help is really appreciated.
Thanks
O(n*lg m) is easy:
Just maintain your window as two
std::sets, one for the lower half, one for the upper half. Insertion of a new element costs O(lg m), finding and removal of an old element costs the same. Determining the median using the method you described in your question costs O(1).As you slide the window over your sequence, in each iteration you remove the item falling out of the window (O(lg m)), insert the new item (O(lg m)) and compute the median (O(1)), resulting in a total of O(n lg m).
This solution uses space O(m), of course but I don’t think you can get away without storing the window’s contents.