I have a small test application that executes two threads simultaneously. One increments a static long _value, the other one decrements it. I’ve ensured with ProcessThread.ProcessorAffinity that the threads are associated with different physical (no HT) cores to force intra processor communication and I have ensured that they overlap in execution time for a significant amount of time.
Of course, the following does not lead to zero:
for (long i = 0; i < 10000000; i++)
{
_value += offset;
}
So, the logical conclusion would be to:
for (long i = 0; i < 10000000; i++)
{
Interlocked.Add(ref _value, offset);
}
Which of course leads to zero.
However, the following also leads to zero:
for (long i = 0; i < 10000000; i++)
{
lock (_syncRoot)
{
_value += offset;
}
}
Of course, the lock statement ensures that the reads and writes are not reordered because it employs a full fence. However, I cannot find any information concerning synchronization of processor caches. If there wouldn’t be any cache synchronization, I’d think I should be seeing deviation from 0 after both threads were finished?
Can someone explain to me how lock/Monitor.Enter/Exit ensures that processor caches (L1/L2 caches) are synchronized?
Cache coherence in this case does not depend on
lock. If you uselockstatement it ensures that your assembler commands are not mixed.a += bis not an atomic to processor, it looks like:And without lock it may be:
But it’s not about cache coherence, it’s a more high-level feature.
So,
lockdoes not ensures that the caches are synchronized. Cache synchronization is a processor internal feature which does not depend on code. You can read about it here.When one core writes a value to memory and then when the second core try to read that value it won’t have the actual copy in its cache unless its cache entry is invalidated so a cache miss occurs. And this cache miss forces cache entry to be updated to actual value.