Let’s imagine that I have a few worker threads such as follows:
while (1) {
do_something();
if (flag_isset())
do_something_else();
}
We have a couple of helper functions for checking and setting a flag:
void flag_set() { global_flag = 1; }
void flag_clear() { global_flag = 0; }
int flag_isset() { return global_flag; }
Thus the threads keep calling do_something() in a busy-loop and in case some other thread sets global_flag the thread also calls do_something_else() (which could for example output progress or debugging information when requested by setting the flag from another thread).
My question is: Do I need to do something special to synchronize access to the global_flag? If yes, what exactly is the minimum work to do the synchronization in a portable way?
I have tried to figure this out by reading many articles but I am still not quite sure of the correct answer… I think it is one of the following:
A: No need to synchronize because setting or clearing the flag does not create race conditions:
We just need to define the flag as volatile to make sure that it is really read from the shared memory every time it is being checked:
volatile int global_flag;
It might not propagate to other CPU cores immediately but will sooner or later, guaranteed.
B: Full synchronization is needed to make sure that changes to the flag are propagated between threads:
Setting the shared flag in one CPU core does not necessarily make it seen by another core. We need to use a mutex to make sure that flag changes are always propagated by invalidating the corresponding cache lines on other CPUs. The code becomes as follows:
volatile int global_flag;
pthread_mutex_t flag_mutex;
void flag_set() { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }
int flag_isset()
{
int rc;
pthread_mutex_lock(flag_mutex);
rc = global_flag;
pthread_mutex_unlock(flag_mutex);
return rc;
}
C: Synchronization is needed to make sure that changes to the flag are propagated between threads:
This is the same as B but instead of using a mutex on both sides (reader & writer) we set it in only in the writing side. Because the logic does not require synchronization. we just need to synchronize (invalidate other caches) when the flag is changed:
volatile int global_flag;
pthread_mutex_t flag_mutex;
void flag_set() { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }
int flag_isset() { return global_flag; }
This would avoid continuously locking and unlocking the mutex when we know that the flag is rarely changed. We are just using a side-effect of Pthreads mutexes to make sure that the change is propagated.
So, which one?
I think A and B are the obvious choices, B being safer. But how about C?
If C is ok, is there some other way of forcing the flag change to be visible on all CPUs?
There is one somewhat related question: Does guarding a variable with a pthread mutex guarantee it's also not cached? …but it does not really answer this.
The ‘minimum amount of work’ is an explicit memory barrier. The syntax depends on your compiler; on GCC you could do:
These memory barriers accomplish two important goals:
They force a compiler flush. Consider a loop like the following:
Without a barrier, a compiler might choose to optimize this to:
Inserting a barrier forces the compiler to write the variable back immediately.
They force the CPU to order its writes and reads. This is not so much an issue with a single flag – most CPU architectures will eventually see a flag that’s set without CPU-level barriers. However the order might change. If we have two flags, and on thread A:
And on thread B:
Some CPU architectures allow these writes to be reordered; you may see both flags being false (ie, the flag A write got moved first). This can be a problem if a flag protects, say, a pointer being valid. Memory barriers force an ordering on writes to protect against these problems.
Note also that on some CPUs, it’s possible to use ‘acquire-release’ barrier semantics to further reduce overhead. Such a distinction does not exist on x86, however, and would require inline assembly on GCC.
A good overview of what memory barriers are and why they are needed can be found in the Linux kernel documentation directory. Finally, note that this code is enough for a single flag, but if you want to synchronize against any other values as well, you must tread very carefully. A lock is usually the simplest way to do things.