I am writing a program which has 8 threads. I am implementing a barrier which has a global count that is incremented by each thread when it possesses the lock. All threads wait in a while loop for this count to become 8 and when it becomes 8 they are supposed to proceed. I am seeing that only the thread that made the count from 7 to 8 actually ends up proceeding while all other threads are stuck at the unlock statement that follows the increment. All of this happens only when either of O1, O2 or O3 optimizations are turned on.
code is
// some code
pthread_spin_lock (&lcl_mutex_1);
sync_count_1++; // global count
pthread_spin_unlock (&lcl_mutex_1);
while (isbreak_1 == 0) {
if (sync_count_1==8) {
cout << a << endl; //a is argument that indicated the thread number.
isbreak_1=1;
}
}
// some code
This entire process works fine when no optimizations are turned on.
Here’s what I verified. i compiled with -O3 and -g on. put a break point at the
cout << a << endl;
line. I saw that the thread that updates the count to 8 is the only one to hit this break point. when i used “info threads” to see the status of other threads, all of them were stuck at pthread_spin_unlock statement.
Any help to resolve this would be appreciated.
Adding
//global declaration
pthread_spinlock_t lcl_mutex_1
//in main
pthread_spin_init (&lcl_mutex_1, 0);
I compiled the code using
g++ -DUSE_SPINLOCK -O3 -g corr_coeff_parallel_v9.cpp -lpthread
I will copy and paste the gdb output as well
[Thread debugging using libthread_db enabled]
[New Thread 0x40a00940 (LWP 30485)]
[New Thread 0x41401940 (LWP 30486)]
[New Thread 0x41e02940 (LWP 30487)]
[New Thread 0x42803940 (LWP 30488)]
[New Thread 0x43204940 (LWP 30489)]
[New Thread 0x43c05940 (LWP 30490)]
[New Thread 0x44606940 (LWP 30491)]
[New Thread 0x45007940 (LWP 30492)]
Time is 53 0 //these are some time measurements I have made before the prolematic section
Time is 51 1
Time is 51 4
Time is 51 5
Time is 51 2
Time is 51 6
Time is 51 3
[Thread 0x2aaaaaabfc10 (LWP 30482) exited]
[Switching to Thread 0x44606940 (LWP 30491)]
Breakpoint 1, calc_corr (t=0x6) at corr_coeff_parallel_v9.cpp:337
337 cout << a << endl;
(gdb) info threads
9 Thread 0x45007940 (LWP 30492) 0x00000000004033e4 in calc_corr (t=0x7) at corr_coeff_parallel_v9.cpp:334
* 8 Thread 0x44606940 (LWP 30491) calc_corr (t=0x6) at corr_coeff_parallel_v9.cpp:337
7 Thread 0x43c05940 (LWP 30490) 0x00000000004033e4 in calc_corr (t=0x5) at corr_coeff_parallel_v9.cpp:334
6 Thread 0x43204940 (LWP 30489) 0x00000000004033e4 in calc_corr (t=0x4) at corr_coeff_parallel_v9.cpp:334
5 Thread 0x42803940 (LWP 30488) 0x00000000004033e4 in calc_corr (t=0x3) at corr_coeff_parallel_v9.cpp:334
4 Thread 0x41e02940 (LWP 30487) 0x00000000004033e4 in calc_corr (t=0x2) at corr_coeff_parallel_v9.cpp:334
3 Thread 0x41401940 (LWP 30486) 0x00000000004033e4 in calc_corr (t=0x1) at corr_coeff_parallel_v9.cpp:334
2 Thread 0x40a00940 (LWP 30485) 0x00000000004033e4 in calc_corr (t=0x0) at corr_coeff_parallel_v9.cpp:334
(gdb)
The POSIX standard makes accessing an object in one thread while another thread is, or might be, modifying it undefined behavior. Your code does this, by accessing
sync_count_1in thewhileloop while another thread might be modifying it. The simplest fix is to hold the spinlock during the read. Another solution would be to use a library (or compiler-specific intrinsic, or assembly code) that provides an atomic memory operation with defined inter-thread memory visibility semantics.