I need to count the quantiles for a large set of data. Let’s assume

Question

0

Asked: May 15, 20262026-05-15T00:17:21+00:00 2026-05-15T00:17:21+00:00

I need to count the quantiles for a large set of data. Let’s assume

0

I need to count the quantiles for a large set of data.

Let’s assume we can get the data only through some portions (i.e. one row of a large matrix). To count the Q3 quantile one need to get all the portions of the data and store it somewhere, then sort it and count the quantile:

List<double> allData = new List<double>();
// This is only an example; the portions of data are not really rows of some matrix
foreach(var row in matrix) 
{
    allData.AddRange(row);
}

allData.Sort();
double p = 0.75 * allData.Count;
int idQ3 = (int)Math.Ceiling(p) - 1;
double Q3 = allData[idQ3];

I would like to find a way of obtaining the quantile without storing the data in an intermediate variable. The best solution would be to count some parameters of mid-results for first row and then adjust it step by step for next rows.

Note:

These datasets are really big (ca 5000 elements in each row)
The Q3 can be estimated, it doesn’t have to be an exact value.
I call the portions of data “rows”, but they can have different leghts! Usually it varies not so much (+/- few hundred samples) but it varies!

This question is similar to “On-line” (iterator) algorithms for estimating statistical median, mode, skewness, kurtosis, but I need to count quantiles.

ALso there are few articles in this topic, i.e.:

Before trying to implement these approaches, I wondered if there are maybe any other, quicker ways of counting the 0.25/0.75 quantiles?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T00:17:22+00:00

Inspired by this answer I created a method that estimates the quantiles quite good. It is approximation close enough for my purposes.

The idea is following: the 0.75 quantile is in fact a median of all values that lies above the global median. And respectively, 0.25 quantile is a median of all values below the global median.

So if we can approximate the median, we can in similar way approximate the quantiles.

double median = 0;
double q1 = 0;
double q3 = 0;
double eta = 0.005;

foreach( var value in listOfValues) // or stream, or any other large set of data...
{
    median += eta * Math.Sign(p.Int - median);
}
// Second pass. We know the median, so we can count the quantiles.
foreach(var value in listOfValues)
{ 
    if(p.Int < median)
        q1 += eta*Math.Sign(p.Int - q1);
    else
        q3 += eta*Math.Sign(p.Int - q3);
}

Remarks:

If distribution of your data is strange, you will need to have bigger eta in order to fit to the strange data. But the accuracy will be worse.
If the distribution is strange, but you know the total size of your collection (i.e. N) you can adjust the eta parameter in this way: at the beggining set the eta to be almost equal some large value (i.e. 0.2). As the loop passes, lower the value of eta so when you reach almost the end of the collection, the eta will be almost equal 0 (for example, in loop compute it like that: eta = 0.2 - 0.2*(i/N);

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to count the quantiles for a large set of data. Let’s assume

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply