I recently began experimenting with the pseudo-boost threadpool (pseudo because it hasn’t been officially accepted yet).
As a simple exercise, I initialized the threadpool with a maximum of two threads.
Each task does two things:
- a CPU-intensive calculation
- writes out the result to disk
Question
How do I modify the model into a threadpool that does:
- a CPU-intensive calculation
and a single I/O thread which listens for completion from the threadpool – takes the resultant memory and simply:
- writes out the result to disk
Should I simply have the task communicate to the I/O thread (spawned
as std::thread) through a std::condition_variable (essentially a mutexed queue of calculation results) or is there a way to
do it all within the threadpool library?Or is the gcc 4.6.1 implementation of
futureandpromisemature enough for me to pull this off?
Answer
It looks like a simple mutex queue with a condition variable works fine.
By grouping read access and writes, in addition to using the threadpool, I got the following improvements:
- 2 core machine: 1h14m down to 33m (46% reduction in runtime)
- 4 core vm: 40m down to 18m (55% reduction in runtime)
Thanks to Martin James for his thoughtful answer. Before this exercise, I thought that my next computational server should have dual-processors and a ton of memory. But now, with so much processing power inherent in the multiple cores and hyperthreading, I realize that money will probably better spent dealing with the I/O bottleneck.
As Martin mentioned, having multiple drives or RAID configurations would probably help. I will also look into adjusting I/O buffer settings at the kernel level.
If there is only one local disk, one writer thread on the end of a producer-consumer queue would be my favourite. Seeks, networked-disk delays and other hiccups will not leave any pooled threads that have finsihed their calculation stuck trying to write to the disk. Other disk operations, (eg. select another location/file/folder), are also easier/quicker if only one thread is accessing it – the queue will take up the slack and allow seamless calculation during the latency.
Writing directly from the calcualtion task or submitting the result-write as a separate task would work OK but you would need more threads in the pool to achieve pause-free operation.
Everything changes if there is more than one disk. More than one writer thread would then become a worthwhile proposition because of the increased overall performance. I would then probably go with an array/list of queues/write-threads, one for each disk.