I am currently working on an application that parses huge XML files.
For each file, there will be different processes but all of them will be parsed into a single object model.
Currently, the objects parsed from each XML file will go into a single collection.
This collection is also used during parsing, e.g. if a similar object already exists, it will modify the object’s property instead, such as adding count.
Looking at the CPU graph when this application is running, it is clear that it only uses part of the CPU (one core at a time on 100%), so I assume that running it on parallel will help shave running time.
I am new into parallel programming, so any help is appreciated.
I would suggest you the following technique: construct a queue of objects that wait to be processed and dequeue them from multiple threads:
The access to the queue needs to be synchronized because you will enqueue and dequeue objects from multiple threads.
The difficulty consists in finding N such that all the CPU cores work at the same time.