I have a question about race conditions and simultaneous writes.
I have a class who’s objects are accessed from different threads. I would like to calculate some values only on demand and cache the result. For performance reasons I’d rather not use locks (before anyone asks – yes it is relevant in my case).
This constitutes a race condition. However, the objects are const and won’t be changed. So if different threads calculate values to be cached they are in my use case guaranteed to be identical. Would it be safe to write these values without locking? Or, in broader terms, is it safe to write identical content to memory from different threads without locking?
The values written are of types bool and double and the architectures in question may be x86 and ARM.
EDIT: Thanx to everyone for their input. I have finally decided to find a way that does not involve caching. This approach does seem to much like a ‘hack’ and there is the problem with using a flag variable.
As you say, this is a race condition. Under C++11 it is technically a data race, and undefined behaviour. It doesn’t matter that the values are the same.
If your compiler supports it (e.g. recent gcc, or gcc or MSVC with my Just::Thread library) then you can use
std::atomic<some_pod_struct>to provide an atomic wrapper around your data (assuming it is a POD struct — if it isn’t then you have bigger problems). If it is small enough then the compiler will make it lock-free, and use the appropriate atomic operations. For larger structures the library will use a lock.The problem with doing this without atomic operations or locks is visibility. Whilst there is no problem at the processor level on either x86 or ARM with writing the same data (assuming it really is byte-for-byte identical) from multiple threads/processors to the same memory, given that this is a cache, I expect you’ll want to read this data rather than recalculate it if it has already been written. You’ll therefore need some sort of flag to indicate done-ness. Unless you use atomic operations, locks or suitable memory barrier instructions then the “ready” flag may become visible to another processor before the data does. This will then really mess things up, as the second processor now reads an incomplete set of data.
You could write the data with non-atomic operations, and then use an atomic data type for the flag. Under C++11 this will generate suitable memory barriers and synchronization to ensure that the data is visible to any thread that sees the flag set. It’s still undefined behaviour for two threads to write the data, but it may be OK in practice.
Alternatively, store the data in a block of heap memory allocated by each thread that does the calculation, and use a compare-and-swap operation to set an atomic pointer variable. If the compare-and-swap fails then another thread got there first, so free the data.