I’m new to using gcc inline assembly, and was wondering if, on an x86 multi-core machine, a spinlock (without race conditions) could be implemented as (using AT&T syntax):
spin_lock: mov 0 eax lock cmpxchg 1 [lock_addr] jnz spin_lock ret spin_unlock: lock mov 0 [lock_addr] ret
You have the right idea, but your asm is broken:
cmpxchgcan’t work with an immediate operand, only registers.lockis not a valid prefix formov.movto an aligned address is atomic on x86, so you don’t needlockanyway.It has been some time since I’ve used AT&T syntax, hope I remembered everything:
Note that GCC has atomic builtins, so you don’t actually need to use inline asm to accomplish this:
As Bo says below, locked instructions incur a cost: every one you use must acquire exclusive access to the cache line and lock it down while
lock cmpxchgruns, like for a normal store to that cache line but held for the duration oflock cmpxchgexecution. This can delay the unlocking thread especially if multiple threads are waiting to take the lock. Even without many CPUs, it’s still easy and worth it to optimize around:The
pauseinstruction is vital for performance on HyperThreading CPUs when you’ve got code that spins like this — it lets the second thread execute while the first thread is spinning. On CPUs which don’t supportpause, it is treated as anop.pausealso prevents memory-order mis-speculation when leaving the spin-loop, when it’s finally time to do real work again. What is the purpose of the "PAUSE" instruction in x86?Note that spin locks are actually rarely used: typically, one uses something like a critical section or futex. These integrate a spin lock for performance under low contention, but then fall back to an OS-assisted sleep and notify mechanism. They may also take measures to improve fairness, and lots of other things the
cmpxchg/pauseloop doesn’t do.Also note that
cmpxchgis unnecessary for a simple spinlock: you can usexchgand then check whether the old value was 0 or not. Doing less work inside thelocked instruction may keep the cache line pinned for less time. See Locks around memory manipulation via inline assembly for a complete asm implementation usingxchgandpause(but still with no fallback to OS-assisted sleep, just spinning indefinitely.)