I’m trying to read multiple files that have been serialized with ProtoBuf.NET using .NET Tasks like this:
public static ResultsDump Amalgamate(RuntimeTypeModel model, IEnumerable<string> files)
{
var readDumpTasks =
files.Select(fn =>
Task<ResultsDump>.Factory.StartNew(() => {
try {
using (var dumpFile = new FileStream(fn, FileMode.Open))
{
var miniDump = (ResultsDump)model.Deserialize(dumpFile, null, typeof(ResultsDump));
if (miniDump == null) {
throw new Exception(string.Format("Failed to deserialize dump file {0}", fn));
}
//readDumps.Add(miniDump);
return miniDump;
}
}
catch (Exception e) {
throw new Exception(string.Format("cannot read dump file {0}: {1}", fn, e.Message), e);
}
})).ToArray();
Task.WaitAll(readDumpTasks);
var allDumps = readDumpTasks.Select(t => t.Result).ToList();
// Goes on.. irrelevant to the question
}
For some reason, CPU usage doesn’t really go above a single core.
Is there something inherent lock in Protobuf.NET that doesn’t like desrializing multiple file concurrently?
I’ve tried this with multiple RuntimeTypeModel instances as well as one, and it always seems to peak at a very “low” CPU usage level..
Am I even wrong to be blaming ProtoBuf.NET? Is this the .NET memory allocator / TPL?
There is intentionally very limited locking in protobuf-net; it only really locks while checking the types (first run) to see what is needed. Once the model is understood, it is lock-free, and it is designed to be trivially parallel.
As noted (comments) it is extremely likely that IO is your bottleneck. Indeed, parallelising access to the same physical disk / spindle will usually greatly reduce throughput, as the buffer is more contended and it has to do more seeking rather than contiguous reading.
This should be easy to test / validate: for a test run, instead of reading from disk, load them all into memory first;
With all the files loaded, now do the same code but passing the
MemoryStreams in as input. If it still doesn’t scale, it might be a bug. I strogly suspect, however, that you will find it parallelises nicely at that point.Here’s a worked example, which for me saturates all my cores with concurrent deserialization: