Problem Background
The code in question is related to C++ implementation. We have code base where for certain critical implementation, we do use asm volatile ("mfence":"memory").
My understanding of memory barriers is –
- It is used to ensure complete/ordered execution of the instruction set.
- It will help avoidance of classical thread synchronization problem – Wiki link.
Question
- Is
pthread_mutextfaster than the memory barrier in case we use memory fence to avoid thread synchronization problem? I have read contents which indicates that pthread mutex uses memory synchronization.
PS :
-
In our code, the use of
asm volatile ("mfence":"memory")is used after a 10-15 lines of c++ code (of member function). So my doubt is – may be a mutext implementation of the memory synchronization gives better performance than that of MB in user implemented code (w.r.t scope of MB). -
We are using SUSE Linux 10, 2.6.16.46, smp#1, x64_86 with quad core processor.
pthread mutexes are guaranteed to be slower than a memory fence instruction (I can’t say how much slower, that is entirely platform dependent). The reason is simply becuase in order to be compliant posix mutexes, they must include memory guarantees. The posix mutexes have strong memory guarantees, and thus I can’t see how they would be implemented without such fences*.
If you’re looking for practical advice I use fences in many places instead of mutexes and have timed both of them frequently. pthread_mutexes are very slow on Linux compared to just a raw memory fence (of course, they do a lot more, so be careful what you are actually comparing).
Note however that certain atomic operations, in particular those in C++11, could, and certainly will, be faster then you using fences all over. In this case the compiler/library understands the architecture and need not use the full fence in order to provide the memory guarantees.
Also note, I’m talking about very low-level performance of the lock itself. You need to be profiling to the nanosecond level.
*It is possible to imagine a mutex system which ignores certain types of memory and chooses a more lenient locking implementation (such as relying on ordering guarantees of normal memory and ignored specially marked memory). I would argue such an implementation is however not valid.