I’m trying to add to a DataTable some information in Parallel but if the the loop is to long it freezes or just takes a lot of time, more time then an usual for loop, this is my code for the Parallel.For loop:
Parallel.For(1, linii.Length, index =>
{
DataRow drRow = dtResult.NewRow();
alResult = CSVParser(linii[index], txtDelimiter, txtQualifier);
for (int i = 0; i < alResult.Count; i++)
{
drRow[i] = alResult[i];
}
dtResult.Rows.Add(drRow);
}
);
What’s wrong? this Parallel.For loop takes much more time than a normal one, what is wrong?
Thanks!
You can’t mutate a
DataTablefrom 2 different threads; it will error.DataTablemakes no attempt to be thread-safe. So: don’t do that. Just do this from one thread. Most likely you are limited by IO, so you should just do it on a single thread as a stream. It looks like you’re processing text data. You seem to have astring[]for lines, perhapsFile.ReadAllLines()? Well, that is very bad here:What you should do is use something like the CsvReader from code project, but even if you want to just use one line at a time, use a StreamReader:
This will not be faster using
Parallel, so I have not attempted to do so. IO is your bottleneck here. Locking would be an option, but it isn’t going to help you massively.As an unrelated aside, I notice that
alResultis not declared inside the loop. That means that in your original codealResultis a captured variable that is shared between all the loop iterations – which means you are already overwriting each row horribly.Edit: illustration of why
Parallelis not relevant for reading 1,000,000 lines from a file:Approach 1: use
ReadAllLinesto load the lines, then useParallelto process them; this costs [fixed time] for the physical file IO, and then we parallelise. The CPU work is minimal, and we’ve basically spent [fixed time]. However, we’ve added a lot of threading overhead and memory overhead, and we couldn’t even start until all the file was loaded.Approach 2: use a streaming API; read each one line by line – processing each line and adding it. The cost here is basically again: [fixed time] for the actual IO bandwidth to load the file. But; we now have no threading overhead, no sync conflicts, no huge memory to allocate, and we start filling the table right away.
Approach 3: If you really wanted, a third approach would be a reader/writer queue, with one dedicated thread processing file IO and enqueueing the lines, and a second that does the
DataTable. Frankly, it is a lot more moving parts, and the second thread will spend 95% of its time waiting for data from the file; stick to Approach 2!