I need to take a large array of doubles, and process it in chunks using a processor intensive function.
My original array is extremely large, about 200MB of doubles of signal data.
I need to take it in chunks of 5000 doubles each, process those with some very processor intensive math using a function that returns a single double. Each of those functions results is needed to create an ordered array that is used later.
I think this would be optimal for parallelism using PLINQ, but I am not quite sure how to go about doing it.
The naive implementation that I wrote looks like this :
var processedList = new List<double>();
var chunk = new List<double>;
foreach (var rawSample in drop.RawSamples)
{
chunk.Add(rawSample);
if (chunk.Count == 5000)
{
// Do long processing here
processedList.Add(LongProcessingFunction(chunk));
chunk.Clear();
}
}
// Do something later with the list of processed values.....
So, where do I start with PLINQ? I need to be able to do the long, intensive function using all the cores of the processor.
I see there is a Take(n) function for IEnumerable….. Can I use this?
Can I use AsParallel here?
Thanks!
First, if you’re dealing with this amount of data, you should avoid processing it element by element, if you can. In your code, you could do that by iterating integers in 5000 increments and using something like
Array.Copy().Or, even better, don’t do any copying at all, let
LongProcessingFunctionaccept an array (orIList<T>, orIReadOnlyList<T>, if you’re on .Net 4.5; but using an interface does have some overhead) and an offset into that array.If you then want to make your code parallel, you can use
ParallelEnumerable.Range()withAsOrdered()(which is necessary to have the results in the correct order) andSelect():