Is this a correct implementation for a generic atomic swap function? I’m looking for a C++03-compatible solution on GCC.
template<typename T>
void atomic_swap(T & a, T & b) {
static_assert(sizeof(T) <= sizeof(void*), "Maximum size type exceeded.");
T * ptr = &a;
b =__sync_lock_test_and_set(ptr, b);
__sync_lock_release(&ptr);
}
If not, what should I do to fix it?
Also: is the __sync_lock_release always necessary? When searching through other codebases I found that this is often not called. Without the release call my code looks like this:
template<typename T>
void atomic_swap(T & a, T & b) {
static_assert(sizeof(T) <= sizeof(void*), "Maximum size type exceeded.");
b = __sync_lock_test_and_set(&a, b);
}
PS: Atomic swap in GNU C++ is a similar question but it doesn’t answer my question because the provided answer requires C++11’s std::atomic and it has signature Data *swap_data(Data *new_data) which doesn’t seem to make sense at all for a swap function. (It actually swaps the provided argument with a global variable that was defined before the function.)
Keep in mind this version of swap is not a fully atomic operation. While the value of
bwill be atomically copied intoa, the value ofamay copy over another modification to the value ofbby another thread. In other words the assignment tobis not atomic with respect to other threads. Thus you could end up with a situation wherea == 1, andb == 2, and after the gcc built-in, you end up witha == 2and the value of1being returned, but now another thread has changed the value ofbto3, and you write over that value inbwith the value of1. So while you may have “technically” swapped the values, you didn’t do it atomically … another thread touched the value ofbin-between the return from the gcc atomic built-in, and the assignment of that return value tob. Looked at from the assembly stand-point, you have something like the following:To be honest, you can’t do a lock-free atomic swap of two separate memory locations without a hardware operation like a DCAS or a weak load-linked/store-conditional, or possibly using some other method like transactional memory (which itself tends to use fine-grained locking).
Secondly, as your function is written right now, if you want your atomic operation to have both acquire and release semantics, then yes, you’re going to have to either place in the
__sync_lock_release, or you’re going to have to add a full memory barrier through__sync_synchronize. Otherwise it will only have acquire semantics on the__sync_lock_test_and_set. Still though, it does not atomically swap two separate memory locations with each other …