I have a C++ app in which I create pthreads to run user provided functions. I want to be able to be alerted in some way when a thread exits so that I can remove it from an array of pthread that I am using to keep the threads. Is there a way to do this, or should the function just set some “magic value”. Because my main code that spawns the pthreads is in a sort of runloop, I can easily check for an exit condition.
Also, is using a std::vector<pthread_t> overdoing to keep track of my threads an overload? The number of threads is not necessarily any sort of constant, many threads or very few could be running. Or is there another STL container that would be good for these additions and deletions (additions always at one end, deletions almost anywhere). Is there some other structure for keeping track of pthreads? Would a stack or a list be right here? Or a standard C array with a generous maximum good? Due to the nature of the problem, I could also maintain a fixed size array of worker threads to whom I pass the user functions that must be executed. Is this a good solution?
Sorry for the long confused question, but I have only worked with threading in dynamic languages where this would never be an issue.
EDIT (3/08/12):
After reading @jojojapan’s answer, I have decided to use a threadpool of sorts. In my structure, I have one producer (a thread in a runloop) and many consumers (the worker threads in the pool). Is there a data structure that is made for multithreaded one-producer many-consumer use? Or whould I just use a std::queue with a pthread_mutex_t on it?
One option you might want to consider is to not actually end and delete threads once they finished a task, but instead keep them alive and have them wait for a new task to be assigned to them. You can accomplish this by doing two things:
If you really want to send a signal once a thread ends, you can use a
pthread_cond_tand callpthread_cond_signalon it just before a thread reaches itsreturnstatement. Of course that assumes that there is some other thread running that waits for these signals and acts upon them by removing the corresponding thread from the vector. Details on the usage are described on the corresponding man page, but also in this SO post.Edit related to the comment and the edited part of the question:
Regarding the number of worker threads: That depends on the resources used the most by the threads. If what those threads do is mostly computation and a bit of memory access, in other words, if they are CPU-bound, it makes sense to use as many threads as your CPU can maintain (specifically, there is a certain number of cores, and number of (hardware) threads per core that your CPU can run before they start slowing each other down. The threads you are creating (software threads) should be about as many, or perhaps a few more (up to two times as many as hardware threads is reasonable according to what @Tudor says here)). However, if your threads make heavy use of memory (memory-bound) or harddisk (IO-bound) or other resources such as the network, NFS, or some other server, you might want to reduce the number of threads in order (a) not to cause them to block each other, and (b) not to put unreasonably much load on certain resources. Determining the right number of threads may be a matter of experimenting, and keeping the number configurable is generally a good idea.
Regarding the best data structure to store work tasks: The concurrent bounded queue mentioned in the comments of the post I cited further above is probably very good. I haven’t tried it myself, though. But if you’d like to keep things simple, a standard
std::queue, or even simply astd::vectorwould not be a bad choice, if you protect them properly using the signal/mutex technique.