I’m working with a dual Cortex-A9 system and I’ve been trying to
understand exactly why spinlock functions need to use DMB. It seems
that as long as the merging store buffer is flushed the lock value
should end up in the L1 on the unlocking core and the SCU should
either invalidate or update the value in the L1 of the other core.
This is enough to maintain coherency and safe locking right? And
doesn’t STREX skip the merging store buffer anyway, meaning we don’t
even need the flush?
DMB appears to be something of a blunt hammer, especially since it
defaults to the system domain, which likely means a write all the way
to main memory, which can be expensive.
Are the DMBs in the locks there as a workaround for drivers that don’t
use smp_mb properly?
I’m currently seeing, based on the performance counters, about 5% of
my system cycles disappearing in stalls caused by DMB.
I found these articles may answer your question:
In particular: