Let’s imagine that I have a few worker threads such as follows: while (1)

Question

0

Asked: May 25, 20262026-05-25T01:39:50+00:00 2026-05-25T01:39:50+00:00

Let’s imagine that I have a few worker threads such as follows: while (1)

0

Let’s imagine that I have a few worker threads such as follows:

while (1) {
    do_something();

    if (flag_isset())
        do_something_else();
}

We have a couple of helper functions for checking and setting a flag:

void flag_set()   { global_flag = 1; }
void flag_clear() { global_flag = 0; }
int  flag_isset() { return global_flag; }

Thus the threads keep calling do_something() in a busy-loop and in case some other thread sets global_flag the thread also calls do_something_else() (which could for example output progress or debugging information when requested by setting the flag from another thread).

My question is: Do I need to do something special to synchronize access to the global_flag? If yes, what exactly is the minimum work to do the synchronization in a portable way?

I have tried to figure this out by reading many articles but I am still not quite sure of the correct answer… I think it is one of the following:

A: No need to synchronize because setting or clearing the flag does not create race conditions:

We just need to define the flag as volatile to make sure that it is really read from the shared memory every time it is being checked:

volatile int global_flag;

It might not propagate to other CPU cores immediately but will sooner or later, guaranteed.

B: Full synchronization is needed to make sure that changes to the flag are propagated between threads:

Setting the shared flag in one CPU core does not necessarily make it seen by another core. We need to use a mutex to make sure that flag changes are always propagated by invalidating the corresponding cache lines on other CPUs. The code becomes as follows:

volatile int    global_flag;
pthread_mutex_t flag_mutex;

void flag_set()   { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }

int  flag_isset()
{
    int rc;
    pthread_mutex_lock(flag_mutex);
    rc = global_flag;
    pthread_mutex_unlock(flag_mutex);
    return rc;
}

C: Synchronization is needed to make sure that changes to the flag are propagated between threads:

This is the same as B but instead of using a mutex on both sides (reader & writer) we set it in only in the writing side. Because the logic does not require synchronization. we just need to synchronize (invalidate other caches) when the flag is changed:

volatile int    global_flag;
pthread_mutex_t flag_mutex;

void flag_set()   { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }

int  flag_isset() { return global_flag; }

This would avoid continuously locking and unlocking the mutex when we know that the flag is rarely changed. We are just using a side-effect of Pthreads mutexes to make sure that the change is propagated.

So, which one?

I think A and B are the obvious choices, B being safer. But how about C?

If C is ok, is there some other way of forcing the flag change to be visible on all CPUs?

There is one somewhat related question: Does guarding a variable with a pthread mutex guarantee it's also not cached? …but it does not really answer this.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T01:39:51+00:00

The ‘minimum amount of work’ is an explicit memory barrier. The syntax depends on your compiler; on GCC you could do:

void flag_set()   {
  global_flag = 1;
  __sync_synchronize(global_flag);
}

void flag_clear() {
  global_flag = 0;
  __sync_synchronize(global_flag);
}

int  flag_isset() {
  int val;
  // Prevent the read from migrating backwards
  __sync_synchronize(global_flag);
  val = global_flag;
  // and prevent it from being propagated forwards as well
  __sync_synchronize(global_flag);
  return val;
}

These memory barriers accomplish two important goals:

They force a compiler flush. Consider a loop like the following:
```
 for (int i = 0; i < 1000000000; i++) {
   flag_set(); // assume this is inlined
   local_counter += i;
 }
```
Without a barrier, a compiler might choose to optimize this to:
```
 for (int i = 0; i < 1000000000; i++) {
   local_counter += i;
 }
 flag_set();
```
Inserting a barrier forces the compiler to write the variable back immediately.
They force the CPU to order its writes and reads. This is not so much an issue with a single flag – most CPU architectures will eventually see a flag that’s set without CPU-level barriers. However the order might change. If we have two flags, and on thread A:
```
  // start with only flag A set
  flag_set_B();
  flag_clear_A();
```
And on thread B:
```
  a = flag_isset_A();
  b = flag_isset_B();
  assert(a || b); // can be false!
```
Some CPU architectures allow these writes to be reordered; you may see both flags being false (ie, the flag A write got moved first). This can be a problem if a flag protects, say, a pointer being valid. Memory barriers force an ordering on writes to protect against these problems.

Note also that on some CPUs, it’s possible to use ‘acquire-release’ barrier semantics to further reduce overhead. Such a distinction does not exist on x86, however, and would require inline assembly on GCC.

A good overview of what memory barriers are and why they are needed can be found in the Linux kernel documentation directory. Finally, note that this code is enough for a single flag, but if you want to synchronize against any other values as well, you must tread very carefully. A lock is usually the simplest way to do things.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let’s imagine that I have a few worker threads such as follows: while (1)

A: No need to synchronize because setting or clearing the flag does not create race conditions:

B: Full synchronization is needed to make sure that changes to the flag are propagated between threads:

C: Synchronization is needed to make sure that changes to the flag are propagated between threads:

So, which one?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply