Suppose that we have the following bit of code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
void guarantee(bool cond, const char *msg) {
if (!cond) {
fprintf(stderr, "%s", msg);
exit(1);
}
}
bool do_shutdown = false; // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;
/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
while (!do_shutdown) { // while loop guards against spurious wakeups
res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
guarantee(res == 0, "Could not wait for shutdown cond");
}
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
/* Called in Thread 2. */
void trigger_shutdown() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
do_shutdown = true;
res = pthread_cond_signal(&shutdown_cond);
guarantee(res == 0, "Could not signal shutdown cond");
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.
In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?
Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It’s clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?
Of course the current C and C++ standards say nothing on the subject.
As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy – it does not precisely lay out the requirements in this area, but implementers are expected to “know what it means” and do something that makes threads usable.
When the standard says that mutexes “synchronize memory access”, implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it’s necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.
Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what “usable” means, in terms of allowing the programmer to “reason convincingly about program correctness”.
Note that Posix doesn’t guarantee a coherent memory cache, so if your implementation perversely wants to cache
do_somethingin a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU’s local cache between the synchronizing operation and readingdo_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.That’s (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to “know about” threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants
volatilein multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.Yes to all of this – if the object is non-volatile, and the compiler can prove that this thread doesn’t modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.
[*] provided that the implementation knows the global is not located at some “special” hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.