I am working on a time series of numeric values, such as those produced

Question

0

Asked: May 25, 20262026-05-25T13:06:59+00:00 2026-05-25T13:06:59+00:00

I am working on a time series of numeric values, such as those produced

0

I am working on a time series of numeric values, such as those produced by a temperature sensor. I’d like to filter those values, roughly selecting those samples that form e.g. the top 10% of the received values.

The obvious solution of recording all samples and using any well-known algorithm for the extraction of the k-highest values is not possible in my case for two reasons:

The series may be infinite, memory is definitely not.
I’d like this algorithm to be usable in real-time, or at least in a streaming mode with predetermined latency.

The distribution of the values is not normal, nor is it consistent with any well-known distribution that I know of. Metrics that I already have available at any time include the mean, the variance and the skewness of the values that have already been received.

Unlike this question, I do not need perfect accuracy, although I would like to be able to tune the parameters of the selection algorithm.

I believe something similar is used in single-pass variable bit-rate (VBR) media codecs to allocate the available bandwidth to each frame, by determining the number of available bits. Unfortunately all the VBR algorithms I studied are too focused on DSP and media streams for me to understand and/or implement.

Are there any known algorithms that could help me deal with this issue? Any hints that would orient me towards the right direction would be greatly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T13:06:59+00:00

If you decide you are only interested in the last 10N items you can use two heaps, one of size N and one of size 9N, to keep track of the N highest items in the last 10N. When you see a new item first remove the oldest item. If it came from the small heap, take the largest item from the large heap and put it in the small. Now look at the new item and either put it straight in the large heap or take the smallest item from the small heap and transfer it to the large one before putting the new item in the small heap.

At any time you have a small heap full of high items and a large heap full of low items, and you know whether the latest item was in the top 10% of those 10N.

But is this really what you want? Note that if your samples rise steadily and then fall steadily over a period of time much larger than your 10N samples then nearly half the time the latest item will be in the top 10% – in fact it will be the largest item seen in the memory of 10N items.

There have been academic papers on finding approximaste quantiles of streaming data. One such is “Effective Computation of Biased Quantiles over Data Streams”, by Cormode, Korn, Muthukrishnan, and Srivastava

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working on a time series of numeric values, such as those produced

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply