I need to find n lowest (which are not 0) from array of doubles (let’s call the array samples). I need to do this many times in a loop, thus the speed of execution is crucial. I tried first sorting the array and then taking the first 10 values (which are not 0), however, although Array.Sort is said to be fast, it became the bottleneck:
const int numLowestSamples = 10;
double[] samples;
double[] lowestSamples = new double[numLowestSamples];
for (int count = 0; count < iterations; count++) // iterations typically around 2600000
{
samples = whatever;
Array.Sort(samples);
lowestSamples = samples.SkipWhile(x => x == 0).Take(numLowestSamples).ToArray();
}
Thus I tried a different, but less clean solution, by first reading in the first n values, sorting them, then looping through all other values in samples checking if the value is smaller than the last value in the sorted lowestSamples array. If the value is lower then replace it with the one in the array and sort the array again. This turned out to be approximately 5 times faster:
const int numLowestSamples = 10;
double[] samples;
List<double> lowestSamples = new List<double>();
for (int count = 0; count < iterations; count++) // iterations typically around 2600000
{
samples = whatever;
lowestSamples.Clear();
// Read first n values
int i = 0;
do
{
if (samples[i] > 0)
lowestSamples.Add(samples[i]);
i++;
} while (lowestSamples.Count < numLowestSamples)
// Sort the array
lowestSamples.Sort();
for (int j = numLowestSamples; j < samples.Count; j++) // samples.Count is typically 3600
{
// if value is larger than 0, but lower than last/highest value in lowestSamples
// write value to array (replacing the last/highest value), then sort array so
// last value in array still is the highest
if (samples[j] > 0 && samples[j] < lowestSamples[numLowestSamples - 1])
{
lowestSamples[numLowestSamples - 1] = samples[j];
lowestSamples.Sort();
}
}
}
Although this works relatively fast, I wanted to challenge anyone to come up with an even faster and better solution.
Instead of repeatedly sorting lowestSamples, just insert the sample where it would sit:
Now, if numLowestSamples needs to be quite large (approaching the size of samples.count) then you may want to use a priority queue that may be faster (generally will be O(logn) for inserting the new sample rather than O(n/2) where n is numLowestSamples). The priority queue would be able to efficiently insert the new value and knock off the largest value on O(logn) time.
With numLowestSamples at 10, there’s really no need for it — especially since you’re only dealing with doubles and not a complex data structure. With a heap and small numLowestSamples, the overhead of allocating memory for the heap nodes (most priority queues use heaps) would probably be greater than any searching/inserting efficiency gains (testing is important).