(This question might be somewhat related to pthread_exit in signal handler causes segmentation fault) I’m writing a leadlock prevention library, where there is always a checking thread doing graph stuff and checks if there is deadlock, if so then it signals one of the conflicting threads. When that thread catches the signal it releases all mutex(es) it owns and exits. There are multiple resource mutexes (obviously) and one critical region mutex, all calls to acquire, release resource lock and do graph calculations must obtain this lock first. Now there goes the problem. With 2 competing (not counting the checking thread) threads, sometimes the program deadlocks after one thread gets killed. In gdb it’s saying the dead thread owns critical region lock but never released it. After adding break point in signal handler and stepping through, it appears that lock belongs to someone else (as expected) right before pthread_exit(), but the ownership magically goes to this thread after pthread_exit()..
The only guess I can think of is the thread to be killed was blocking at pthread_mutex_lock when trying to gain the critical region lock (because it wanted another resource mutex), then the signal came, interrupting the pthread_mutex_lock. Since this call is not signal-proof, something weird happened? Like the signal handler might have returned and that thread got the lock then exited? Idk.. Any insight is appreciated!
pthread_exitis not async-signal-safe, and thus the only way you can call it from a signal handler is if you ensure that the signal is not interrupting any non-async-signal-safe function.As a general principle, using signals as a method of communication with threads is usually a really bad idea. You end up mixing two issues that are already difficult enough on their own: thread-safety (proper synchronization between threads) and reentrancy within a single thread.
If your goal with signals is just to instruct a thread to terminate, a better mechanism might be
pthread_cancel. To use this safely, however, the thread that will be cancelled must setup cancellation handlers at the proper points and/or disable cancellation temporarily when it’s not safe (withpthread_setcancelstate). Also, be aware thatpthread_mutex_lockis not a cancellation point. There’s no safe way to interrupt a thread that’s blocked waiting to obtain a mutex, so if you need interruptability like this, you probably need either a more elaborate synchronization setup with condition variables (condvar waits are cancellable), or you could use semaphores instead of mutexes.Edit: If you really do need a way to terminate threads waiting for mutexes, you could replace calls to
pthread_mutex_lockwith calls to your own function that loops callingpthread_mutex_timedlockand checking for an exit flag on each timeout.