How can I be sure the data that is written by multiple CPU cores during a mutex lock is synchronized across all L1 caches of all cores ? I am not talking about the variable that represents the lock, I am talking about the memory locations that are involved during the lock.
This is for Linux, x86_64, and my code is:
#include <sys/types.h>
#include "dlog.h"
uint *dlog_line;
volatile int dlog_lock;
char *dlog_get_new_line(void) {
uint val;
while(!__sync_bool_compare_and_swap(&dlog_lock, 0, 1)) {
val=*dlog_line;
if (val==DT_DLOG_MAX_LINES) val=0;
*dlog_line=val;
}
dlog_lock = 0;
}
Here, inside dlog_get_new_line() function, I use gcc builtin function so there shouldn’t be any problem with aquiring the lock. But how can I ensure that when the lock is released, the value pointed by *dlog_line propagates into all the L1 cache of all the other CPU cores in the system?
I do not use pthreads, each process runs on different cpu core.
What you’re interested in is called cache coherence. This is done automatically by the hardware.
So in short, you don’t have to do anything if you are correctly using
__sync_bool_compare_and_swap()(or any other locking intrinsic).As an oversimplfied explanation, the thread will not return from the call to
__sync_bool_compare_and_swap()until all the other processors are able to see the new value or are aware that their local copy is out-of-date.If you’re interested in what happens underneath (in the hardware), there are various cache coherence algorithms that are used to ensure that a core doesn’t read an outdated copy of data.
Here’s a partial list of commonly taught protocols:
Modern hardware will typically have much more complicated algorithms for it.