I’m trying to come up with a way to speed up combining massive amounts of objects contained in Lists in the quickest fashion possible. Hoping to take advantage of PLINQ I attempted it but this is not a thread-safe solution. I have tested in VS2010 and VS11Beta in 4.0 and 4.5. This is my sample app. If you change BlowUp() between 1-500 it will usually work. After 500 the wheels come of the track. It will fail in multiple places. Does anyone know the fastest way to solve this problem? (Multidimensional array + PLINQ?)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace PLinqBlowsUp
{
class Program
{
static void Main(string[] args)
{
BlowUp(5000);
}
private static void BlowUp(int blowupNum)
{
try
{
var theExistingMasterListOfAllRowsOfData = new List<List<KeyValuePair<string, dynamic>>>();
//Add some test data
Enumerable.Range(0, blowupNum).AsParallel().ForAll(row => theExistingMasterListOfAllRowsOfData.Add(AddRowToMasterList(row)));
var aNewRowOfData = new List<KeyValuePair<string, dynamic>>();
//Add some test data
var column = new KeyValuePair<string, dynamic>("Title", "MyTitle");
aNewRowOfData.Add(column);
var anotherNewRowOfData = new List<KeyValuePair<string, dynamic>>();
//Add some test data
var columnA = new KeyValuePair<string, dynamic>("Date", DateTime.Now);
var columnB = new KeyValuePair<string, dynamic>("ImportantColumn", "ImportantData");
var columnC = new KeyValuePair<string, dynamic>("VeryImportantColumn", "VeryImportantData");
anotherNewRowOfData.Add(columnA);
anotherNewRowOfData.Add(columnB);
anotherNewRowOfData.Add(columnC);
//Now the Problem
aNewRowOfData.AsParallel().ForAll(anrod => theExistingMasterListOfAllRowsOfData.ForEach(temloarod => temloarod.Add(anrod)));
anotherNewRowOfData.AsParallel().ForAll(anrod => theExistingMasterListOfAllRowsOfData.ForEach(temloarod => temloarod.Add(anrod)));
//Test for number
foreach (var masterRow in theExistingMasterListOfAllRowsOfData)
{
if (masterRow.Count != 7)
throw new Exception("BLOW UP!!!");
}
}
catch (AggregateException ex)
{
Console.WriteLine(ex.Message);
}
}
private static List<KeyValuePair<string, dynamic>> AddRowToMasterList(int row)
{
var columnA = new KeyValuePair<string, dynamic>("FirstName", "John" + row.ToString());
var columnB = new KeyValuePair<string, dynamic>("LastName", "Smith" + row.ToString());
var columnC = new KeyValuePair<string, dynamic>("Ssn", 123456789 + (row*10));
var list = new List<KeyValuePair<string, dynamic>>();
list.Add(columnA);
list.Add(columnB);
list.Add(columnC);
return list;
}
}
}
I see two problems.
Addon thetheExistingMasterListOfAllRowsOfDatainstance from more than one thread without any attempt to synchronize access to it.Addon the individualList<KeyValuePair<string, dynamic>>items from more than one thread without any attempt to synchronize them.You could use
lockto protect theAddmethods or use aConcurrentBaginstead. However, neither of these options are that good. The issue here is that this kind of operation cannot be parallelized very well because all of the threads will end up competing for the same lock. I highly suspect that even the low-lockConcurrentBagwill end up being slower than had you just punted on PLINQ and done everything on the main thread to begin with.