Why is CompareAndSwap instruction considered expensive?
I read in a book:
“Memory barriers are expensive, about as
expensive as an atomic compareAndSet()
instruction.”
Thanks!
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
“CAS isn’t appreciably different than a normal store. Some of the misinformation regarding CAS probably arises from the original implementation of lock:cmpxchg (CAS) on Intel processors. The lock: prefix caused the LOCK# signal to be asserted, acquiring exclusive access to the bus. This didn’t scale of course. Subsequent implementations of lock:cmpxchg leverage cache coherency protocol — typically snoop-based MESI — and don’t assert LOCK#.” – David Dice, Biased locking in HotSpot
This is quite true.
E.g. on x86, a proper CAS on a multi-processor system has a lock prefix.
The lock prefix results in a full memory barrier:
A memory barrier is in fact implemented as a dummy
LOCK ORorLOCK ANDin both the .NET and the JAVA JIT on x86/x64.On x86, CAS results in a full memory barrier.
On PPC, it is different. An LL/SC pair –
lwarx&stwcx– can be used to load the memory operand into a register, then either write it back if there was no other store to the target location, or retry the whole loop if there was. An LL/SC can be interrupted.It also does not mean an automatic full fence.
Performance characteristics and behaviour can be very different on different architectures.
But then again – a weak LL/SC is not CAS.