I have a stream (whose length I do not know, theoretically may be infinity).
I read the stream’s elements one by one.
Every time an element is read from the stream, I want to be able to return the kth greatest element read so far.
(Ideally for me it would be a code in python and/or lisp/scheme).
K is read at the beginning, and K can be a NUMBER (3rd, 4th), or can be a PROCENT (K % of the total nr. of elements read so far). If K=1/2, that means to extract the median element each time… For example, after reading Nth element, it must return N/2 th greatest element
example K=1/2:
3 -> 3
3,4 -> 3
3,4,2 -> 3
3,4,2,1 -> 2
etc.
I think this example is enough to clarify the question. I need minimal possible time to extract the Kth element. (this supposes read the stream in O(1), then insert the read value , then extract the Kth element).
I want any solution better than O(n).
So, since you need k-th element and k is known before running the algorithm, first observation you need to store at most k elements, k smallest elements.
When you read new element you need to insert element in some datastructure keeping it’s properties and having opportunity to retrieve the answer quickly.
1) You may use max-heap having at most k elements. Read element insert into heap (log(k)), then if you have more then k elements (k+1 to be precise) you need to extract_max O(log(k)) to extract and rebuild and the answer will be on the top of heap access O(1).
So, each time it takes log(k) to get k-th element, in total for all elements – n * log(k).
2) In case of using percentage, place of element will be calculated dynamically depending of how many elements were processed, here you may use order statistics tree, http://en.wikipedia.org/wiki/Order_statistic_tree with the same log(amount of elements) insertion and log(amount of elements) lookup.