I have a multithreaded application. Each module is executed in a separate thread.
Modules are:
- network module - used to receive/send data from network
- parser module - encode/decode network data to internal presentation
- 2 application module - perform some application logic on the above data one after other
- counter module - used to gather statistics from other modules
- timer module - used to schedule timers
- and much more ...
All threads using message queues for inter thread communication (std::deque sync by conditional variable and mutex).
Some modules are used by others ones (e.g. all modules use timer and counter) and this for each message received from network wich should be handled in very high rates.
This is pretty complex application and the design looks “reasonable”. From other hand, I’m not sure that such design, thread per module, is the “best” one? In particular, I’m afraid that such design “encorage” a lot of context switches.
What do you think?
Is there’re any good guidelines or open source project to learn from how to do “correct” design of threaded application?
Thread-per-function designs are just naive: they assume that by separating tasks – by module – onto threads, that some kind of scalability will be achieved.
This kind of design is inefficient, as very few task breakdowns yield exactly as many tasks as there are CPUs.
Far more rational designs are to break tasks down into ‘jobs’ – and then use thread pooling mechanisms to dispatch those jobs.
Advantages over the thread-per-module approach:
Thread pools take advantage of all cores. with thread-per-module if you have modules < cores you have cores sitting idle.
Thread pools minimize contention and resources by maintaining a parity between active threads, and cores. with thread-per-module, if modules > cores you incur needless extra context switches and (on some platforms) each thread exhausts other limited per process resources (like virtual memory).
Thread pools let a “module” do multiple jobs at a time. thread-per-module means that the busiest module still only gets one core.