In one of the docs for atomic variables in C++0x, when describing memory order, it mentions:
Release-Acquire Ordering
On strongly-ordered systems (x86, SPARC, IBM mainframe), release-acquire ordering is
automatic. No additional CPU instructions are issued for this synchronization mode, only
certain compiler optimizations are affected…
First is it true, that x86 follows strict memory ordering?
Seems very inefficient to always impose this. Means every write and read has a fence?
Also, if I have an aligned int, on an x86 system, do the atomic variables serve any purpose at all?
Yes, it’s true that x86 has strict memory ordering, see Volume 3A, Chapter 8.2 of the Intel manuals. Older x86 processors such as the 386 provided truly strict ordering (called strong ordering) semantics, while more modern x86 processors have slightly relaxed conditions in a few cases, but nothing you need to worry about. For example, the Pentium and 486 allow read cache misses to go ahead of buffered writes when the writes are cache hits (and are therefore to different addresses from the reads).
Yes, it can be inefficient. Sometimes high-performance software is written only for other architectures with looser memory ordering requirements because of this.
Yes, atomic variables still serve a purpose on x86. They have special semantics with the compiler such that a typical read-modify-write operation happens atomically. If you have two threads incrementing an atomic variable (by which I mean a variable of type
std::atomic<T>in C++11) simultaneously, you can be assured that the value will be 2 larger; withoutstd::atomic, you might end up with the wrong value because one thread cached the current value in a register while performing the increment, even though the store to memory is atomic on x86.