I have a multi-threaded application that creates 48 threads that all need to access a common attribute (stl::map). The map will only be written to when the threads start, and the rest of the time the map will be read from. This seems like the perfect use-case for a pthread_rw_lock, and all appears to be working well.
I ran across a completely unrelated seg-fault and started analyzing the core. Using gdb, I executed the command info threads and was quite surprised at the results. I observed that several threads were actually reading from the map as expected, but the strange part is that several threads were blocked in pthread_rwlock_rdlock() waiting on the rw_lock.
Here is the stack trace for a thread that is waiting on the lock:
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf76fe159 in __lll_lock_wait () from /lib/libpthread.so.0
#2 0xf76fab5d in pthread_rwlock_rdlock () from /lib/libpthread.so.0
#3 0x0804a81a in DiameterServiceSingleton::getDiameterService(void*) ()
With so many threads, its difficult to say how many were reading and how many were blocked, but I dont understand why any threads would be blocked waiting to read, considering other threads are already reading.
So here is my question: Why are some threads blocked waiting to read a rw_lock, when other threads are already reading from it? It appears as though there is a limit to the number of threads that can simultaneously read.
Ive looked at the pthread_rwlock_attr_t functions and didnt see anything related.
The OS is Linux, SUSE 11.
Here is the related code:
{
pthread_rwlock_init(&serviceMapRwLock_, NULL);
}
// This method is called for each request processed by the threads
Service *ServiceSingleton::getService(void *serviceId)
{
pthread_rwlock_rdlock(&serviceMapRwLock_);
ServiceMapType::const_iterator iter = serviceMap_.find(serviceId);
bool notFound(iter == serviceMap_.end());
pthread_rwlock_unlock(&serviceMapRwLock_);
if(notFound)
{
return NULL;
}
return iter->second;
}
// This method is only called when the app is starting
void ServiceSingleton::addService(void *serviceId, Service *service)
{
pthread_rwlock_wrlock(&serviceMapRwLock_);
serviceMap_[serviceId] = service;
pthread_rwlock_unlock(&serviceMapRwLock_);
}
Update:
As mentioned in the comments by MarkB, if I had set pthread_rwlockattr_getkind_np() to give priority to writers, and there is a writer blocked waiting, then the observed behavior would make sense. But, Im using the default value which I believe is to give priority to readers. I just verified that there are no threads blocked waiting to write. I also update the code as suggested by @Shahbaz in the comments and get the same results.
You merely observed the inherent performance issues involved with acquiring locks. It takes some time, and you just happened to catch those threads in the middle of it. This is particularly true when the operation protected by the lock has a very short duration.
To demonstrate, I implemented a short routine that sleeps after acquiring the
rwlock. The main thread waits for them to finish. But before waiting, it prints the concurrency achieved by the threads.And the main thread waits for the counter to reach 50:
Edit: Simplified the example using C++11 thread support: