If I have some code that looks something like:
typedef struct {
bool some_flag;
pthread_cond_t c;
pthread_mutex_t m;
} foo_t;
// I assume the mutex has already been locked, and will be unlocked
// some time after this function returns. For clarity. Definitely not
// out of laziness ;)
void check_flag(foo_t* f) {
while(f->flag)
pthread_cond_wait(&f->c, &f->m);
}
Is there anything in the C standard preventing an optimizer from rewriting check_flag as:
void check_flag(foo_t* f) {
bool cache = f->flag;
while(cache)
pthread_cond_wait(&f->c, &f->m);
}
In other words, does the generated code have to follow the f pointer every time through the loop, or is the compiler free to pull the dereference out?
If it is free to pull it out, is there any way to prevent this? Do I need to sprinkle a volatile keyword somewhere? It can’t be check_flag‘s parameter because I plan on having other variables in this struct that I don’t mind the compiler optimizing like this.
Might I have to resort to:
void check_flag(foo_t* f) {
volatile bool* cache = &f->some_flag;
while(*cache)
pthread_cond_wait(&f->c, &f->m);
}
Normally, you should try to lock the pthread mutex before waiting on the condition object as the
pthread_cond_waitcall release the mutex (and reacquire it before returning). So, yourcheck_flagfunction should be rewritten like that to conform to the semantic on the pthread condition.Concerning the question of whether or not the compiler is allowed to optimize the reading of the
flagfield, this answer explains it in more detail than I can.Basically, the compiler know about the semantic of
pthread_cond_wait,pthread_mutex_lockandpthread_mutex_unlock. He know that he can’t optimize memory reading in those situation (the call topthread_cond_waitin this exemple). There is no notion of memory barrier here, just a special knowledge of certain function, and some rule to follow in their presence.There is another thing protecting you from optimization performed by the processor. Your average processor is capable of reordering memory access (read / write) provided that the semantic is conserved, and it is always doing it (as it allow to increase performance). However, this break when more than one processor can access the same memory address. A memory barrier is just an instruction to the processor telling it that it can move the read / write that were issued before the barrier and execute them after the barrier. It has finish them now.