I am working on a program that uses mpi (openmpi 1.4.3) and pthreads, working in c++ under linux.
some of the mpi nodes have a queuing system implemented with pthreads.
Idea is simple one thread adding elements into queue, and few other “working” threads picking up objects and doing their job on them (not a rocket science).
Please consider 2 examples of my working threads which picking up elements.
First example working fine unless -O3 optimization is specified. In that case it starts to endlessly looping without picking up anything.
while (true){
if (t_exitSignal[tID]){
dorun = false;
break;
}
//cout<<"w8\n";
//check if queue has some work for us
if (!frame_queue->empty()){
//try to get lock and recheck that queue no empty
pthread_mutex_lock( &mutex_frame_queue );
if (!frame_queue->empty()){
cout<<"Pickup "<<tID<<endl;
con = frame_queue->front();
frame_queue->pop();
t_idling[tID] = false;
pthread_mutex_unlock( &mutex_frame_queue );
break;
}
pthread_mutex_unlock( &mutex_frame_queue );
}
}
Now consider this one, exactly the same code, except mutex gettimg locked before I checking for queue->empthy. This work works fine for all levels of optimization.
while (true){
if (t_exitSignal[tID]){
dorun = false;
break;
}
//cout<<"w8\n";
//try to get lock and recheck that queue no empty
pthread_mutex_lock( &mutex_frame_queue );
//check if queue has some work for us
if (!frame_queue->empty()){
cout<<"Pickup "<<tID<<endl;
con = frame_queue->front();
frame_queue->pop();
t_idling[tID] = false;
pthread_mutex_unlock( &mutex_frame_queue );
break;
}
pthread_mutex_unlock( &mutex_frame_queue );
}
Just in case it make a difference this is how I populate queue from other thread
pthread_mutex_lock( &mutex_frame_queue );
//adding the same contianer into queue to make it available for threads
frame_queue->push(*cursor);
pthread_mutex_unlock( &mutex_frame_queue );
My question is: why first example of code stop working why I compiling with -O3 option ?
Any other suggestion for the queuing system ?
Thanks a lot!
SOLUTION: This is what I come up with at the end. Seems to work much better than either of the methods above. (just in case someone interested 😉
while (true){
if (t_exitSignal[tID]){
dorun = false;
break;
}
//try to get lock and check that queue no empty
pthread_mutex_lock( &mutex_frame_queue );
if (!frame_queue->empty()){
con = frame_queue->front();
frame_queue->pop();
t_idling[tID] = false;
pthread_mutex_unlock( &mutex_frame_queue );
break;
}else{
pthread_cond_wait(&conf_frame_queue, &mutex_frame_queue);
pthread_mutex_unlock( &mutex_frame_queue );
}
}
Adding
pthread_mutex_lock( &mutex_frame_queue );
//adding the same contianer into queue to make it available for threads
frame_queue->push(*cursor);
//wake up any waiting threads
pthread_cond_signal(&conf_frame_queue);
pthread_mutex_unlock( &mutex_frame_queue )
I’m guessing you are seeing a bug based on assumptions about instruction ordering when you check if the queue is empty – when you turn up the optimization the ordering changes and it breaks because the mutex you have hasn’t put up a memory barrier protecting this from occurring.