And how much faster/slower it is as compared to an uncontested atomic variable (such as std::atomic<T> of C++) operation.
Also, how much slower are contested atomic variables relative to the uncontested lock?
The architecture I’m working on is x86-64.
There’s a project on GitHub with the purpose of measuring this on different platforms. Unfortunately, after my master thesis I never really had the time to follow up on this but at least the rudimentary code is there.
It measures pthreads and OpenMP locks, compared to the
__sync_fetch_and_addintrinsic.From what I remember, we were expecting a pretty big difference between locks and atomic operations (~ an order of magnitude) but the real difference turned out to be very small.
However, measuring now on my system yields results which reflect my original guess, namely that (regardless of whether pthreads or OpenMP is used) atomic operations are about five times faster, and a single locked increment operation takes about 35ns (this includes acquiring the lock, performing the increment, and releasing the lock).