I tried looking for details on this, I even read the standard on mutexes and atomics… but still I couldnt understand the C++11 memory model visibility guarantees.
From what I understand the very important feature of mutex BESIDE mutual exclusion is ensuring visibility. Aka it is not enough that only one thread per time is increasing the counter, it is important that the thread increases the counter that was stored by the thread that was last using the mutex(I really dont know why people dont mention this more when discussing mutexes, maybe I had bad teachers :)).
So from what I can tell atomic doesnt enforce immediate visibility:
(from the person that maintains boost::thread and has implemented c++11 thread and mutex library):
A fence with memory_order_seq_cst does not enforce immediate
visibility to other threads (and neither does an MFENCE instruction).
The C++0x memory ordering constraints are just that — ordering
constraints. memory_order_seq_cst operations form a total order, but
there are no restrictions on what that order is, except that it must
be agreed on by all threads, and it must not violate other ordering
constraints. In particular, threads may continue to see “stale” values
for some time, provided they see values in an order consistent with
the constraints.
And I’m OK with that. But the problem is that I have trouble understanding what C++11 constructs regarding atomic are “global” and which only ensure consistency on atomic variables.
In particular I have understanding which(if any) of the following memory orderings guarantee that there will be a memory fence before and after load and stores:
http://www.stdthread.co.uk/doc/headers/atomic/memory_order.html
From what I can tell std::memory_order_seq_cst inserts mem barrier while other only enforce ordering of the operations on certain memory location.
So can somebody clear this up, I presume a lot of people are gonna be making horrible bugs using std::atomic , esp if they dont use default (std::memory_order_seq_cst memory ordering)
2. if I’m right does that mean that second line is redundand in this code:
atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);
3. do std::atomic_thread_fences have same requirements as mutexes in a sense that to ensure seq consistency on nonatomic vars one must do std::atomic_thread_fence(std::memory_order_seq_cst);
before load and
std::atomic_thread_fence(std::memory_order_seq_cst);
after stores?
4. Is
{
regularSum+=atomicVar.load();
regularVar1++;
regularVar2++;
}
//...
{
regularVar1++;
regularVar2++;
atomicVar.store(74656);
}
equivalent to
std::mutex mtx;
{
std::unique_lock<std::mutex> ul(mtx);
sum+=nowRegularVar;
regularVar++;
regularVar2++;
}
//..
{
std::unique_lock<std::mutex> ul(mtx);
regularVar1++;
regularVar2++;
nowRegularVar=(74656);
}
I think not, but I would like to be sure.
EDIT:
5.
Can assert fire?
Only two threads exist.
atomic<int*> p=nullptr;
first thread writes
{
nonatomic_p=(int*) malloc(16*1024*sizeof(int));
for(int i=0;i<16*1024;++i)
nonatomic_p[i]=42;
p=nonatomic;
}
second thread reads
{
while (p==nullptr)
{
}
assert(p[1234]==42);//1234-random idx in array
}
If you like to deal with fences, then
a.load(memory_order_acquire)is equivalent toa.load(memory_order_relaxed)followed byatomic_thread_fence(memory_order_acquire). Similarly,a.store(x,memory_order_release)is equivalent to a call toatomic_thread_fence(memory_order_release)before a call toa.store(x,memory_order_relaxed).memory_order_consumeis a special case ofmemory_order_acquire, for dependent data only.memory_order_seq_cstis special, and forms a total order across allmemory_order_seq_cstoperations. Mixed with the others it is the same as an acquire for a load, and a release for a store.memory_order_acq_relis for read-modify-write operations, and is equivalent to an acquire on the read part and a release on the write part of the RMW.The use of ordering constraints on atomic operations may or may not result in actual fence instructions, depending on the hardware architecture. In some cases the compiler will generate better code if you put the ordering constraint on the atomic operation rather than using a separate fence.
On x86, loads are always acquire, and stores are always release.
memory_order_seq_cstrequires stronger ordering with either anMFENCEinstruction or aLOCKprefixed instruction (there is an implementation choice here as to whether to make the store have the stronger ordering or the load). Consequently, standalone acquire and release fences are no-ops, butatomic_thread_fence(memory_order_seq_cst)is not (again requiring anMFENCEorLOCKed instruction).An important effect of the ordering constraints is that they order other operations.
thread_2spins until it readstruefromready. Since the store toreadyinthread_1is a release, and the load is an acquire then the store synchronizes-with the load, and the store toihappens-before the load fromiin the assert, and the assert will not fire.2) The second line in
is indeed potentially redundant, because the store to
atomicVarusesmemory_order_seq_cstby default. However, if there are other non-memory_order_seq_cstatomic operations on this thread then the fence may have consequences. For example, it would act as a release fence for a subsequenta.store(x,memory_order_relaxed).3) Fences and atomic operations do not work like mutexes. You can use them to build mutexes, but they do not work like them. You do not have to ever use
atomic_thread_fence(memory_order_seq_cst). There is no requirement that any atomic operations arememory_order_seq_cst, and ordering on non-atomic variables can be achieved without, as in the example above.4) No these are not equivalent. Your snippet without the mutex lock is thus a data race and undefined behaviour.
5) No your assert cannot fire. With the default memory ordering of memory_order_seq_cst, the store and load from the atomic pointer
pwork like the store and load in my example above, and the stores to the array elements are guaranteed to happen-before the reads.