I’ve currently written a program in C++ that sometimes uses over 300 threads. In my program, I have an array of structs and the length of the array equals the number of threads. Let’s assume that I have 400 structs and therefore, 400 threads.
In a single iteration of a for loop, I apply a function to each of the 400 structs, and this function is executed in a thread. Therefore, I have 400 threads running concurrently.
(I am using the boost thread library).
I’ve tried to give a breakdown of what my code looks like (it is not the actual code):
struct my_struct{
// Structure's members
};
std::vector<my_struct> my_vec;
void my_fun(my_struct* my_str){
// Operations on my_str
}
int main(){
std::vector<boost::thread> thr(400);
for (int k = 0; k < 300; k++){
for (int i = 0; i < 400; i++){
thr.at(i) = boost::thread(my_fun, &my_vec.at(i));
}
}
for (int m = 0; m < M; m++){
thr.at(m).join();
}
}
}
The function I am using is computationally intensive, and from the code above, I use 400 threads to do calculations and this is done 300 times. Is there any more efficient way of performing this task? I’m not sure if having so many threads active at a single time may affect performance. I’ve heard of the threadpool library, but I’m not sure whether it’ll provide any benefit to me. Any help is appreciated.
Thank You Very Much.
There is absolutely no benefit to spawning 400 CPU-bound threads unless you have 400+ processor cores in your target machine.
It would be impossible to tell you with any certainty how to better distribute your workload without knowing what sort of computations you’re performing, and on what kind of data.
As a shot in the dark, judging from what you have posted, a first stab would be to use
Nthreads (see below), and divide your 400 objects among them so that each thread is responsible for processing approximately400/Nobjects. Each thread can loop 300 times, and on each iteration it can process each of its assigned objects.Nis an arbitrary number; in fact, I recommend trying different values and comparing the performance results. However, unless your threads are performing I/O or other operations that waste time blocking on non-computational operations,Nshould be no larger than the number of processor cores in your machine (try it and watch your performance drop quickly).Edit: As per the ongoing discussion, it would be advisable to employ a queue of your objects from which each of your
Nthreads can simply pop as they are ready for more work. The queue will of course need to be thread-safe. For optimal performance, a lock-free queue should be implemented. There’s a good paper here. The implementation should be simplified by the fact that you are fully populating the queue once and therefore only need thread-safe reads.