I have a thread pool. The main() function kicks off the classic pool setup. A boss thread and a few worker threads. Most of the code is completed, however the missing part is the error handling.
When an error occurs to one of the boss/worker threads, pthread_exit() is called. How does the main() thread knows that something went wrong in the pool in order to restart it?
If you want to save error or recovery information, or you want a non-blocking function, you can use a condition variable together with an associated mutex and a structure containing the failing thread, the error and the recovery information you need. All these variables should be global.
At the boss thread you must first initialize the err structure and then lock the mutex.
Then you wait for a condition to occur using pthread_cond_wait.
After the condition occured you handle the error and you use pthread_join to get the return value from your thread. Note that pthread_cond_wait is blocking, if you want a non-blocking version you should use pthread_cond_timedwait which has a third parameter, a struct timespec *, which hold the absolute system time at which the wait expires. At the end remember to unlock your mutex.
At the failing worker thread, before exiting you should first lock the mutex, then fill the err structure, signal the boss thread, unlock the mutex and exit. To signal the boss thread you should use the pthread_cond_signal function.