In the following program I attempt the make the print function thread-safe by using a function-local mutex object:
#include <iostream>
#include <chrono>
#include <mutex>
#include <string>
#include <thread>
void print(const std::string & s)
{
// Thread safe?
static std::mutex mtx;
std::unique_lock<std::mutex> lock(mtx);
std::cout <<s << std::endl;
}
int main()
{
std::thread([&](){ for (int i = 0; i < 10; ++i) print("a" + std::to_string(i)); }).detach();
std::thread([&](){ for (int i = 0; i < 10; ++i) print("b" + std::to_string(i)); }).detach();
std::thread([&](){ for (int i = 0; i < 10; ++i) print("c" + std::to_string(i)); }).detach();
std::thread([&](){ for (int i = 0; i < 10; ++i) print("d" + std::to_string(i)); }).detach();
std::thread([&](){ for (int i = 0; i < 10; ++i) print("e" + std::to_string(i)); }).detach();
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
Is this safe?
My doubts arise from this question, which presents a similar case.
C++11
In C++11 and later versions: yes, this pattern is safe. In particular, initialization of function-local static variables is thread-safe, so your code above works safely across threads.
The way this works in practice is that the compiler inserts any necessary boilerplate in the function itself to check if the variable is initialized prior to access. In the case of
std::mutexas implemented ingcc,clangandicc, however, the initialized state is all-zeros, so no explicit initialization is needed (the variable will live in the all-zeros.bsssection so the initialization is "free"), as we see from the assembly1:Note that starting at the line
mov edi, OFFSET FLAT:_ZZ3incRiE3mtxit simply loads the address of theinc::mtxfunction-local static and callspthread_mutex_lockon it, without any initialization. The code before that dealing withpthread_key_createis apparently just checking if the pthreads library is present at all.There’s no guarantee, however, that all implementations will implement
std::mutexas all-zeros, so you might in some cases incur ongoing overhead on each call to check if themutexhas been initialized. Declaring the mutex outside the function would avoid that.Here’s an example contrasting the two approaches with a stand-in
mutex2class with a non-inlinable constructor (so the compiler can’t determine that the initial state is all-zeros):The function-local version compiles (on
gcc) to:Note the large amount of boilerplate dealing with the
__cxa_guard_*functions. First, a rip-relative flag byte,_ZGVZ9inc_localRiE3mtx2 is checked and if non-zero, the variable has already been initialized and we are done and fall into the fast-path. No atomic operations are needed because on x86, loads already have the needed acquire semantics.If this check fails, we go to the slow path, which is essentially a form of double-checked locking: the initial check is not sufficient to determine that the variable needs initialization because two or more threads may be racing here. The
__cxa_guard_acquirecall does the locking and the second check, and may either fall through to the fast path as well (if another thread concurrently initialized the object), or may jump dwon to the actual initialization code at.L12.Finally note that the last 5 instructions in the assembly aren’t direct reachable from the function at all as they are preceded by an unconditional
jmp .L3and nothing jumps to them. They are there to be jumped to by an exception handler should the call to the constructormutex2()throw an exception at some point.Overall, we can say that the runtime cost of the first-access initialization is low to moderate because the fast-path only checks a single byte flag without any expensive instructions (and the remainder of the function itself usually implies at least two atomic operations for
mutex.lock()andmutex.unlock(), but it comes at a significant code size increase.Compare to the global version, which is identical except that initialization happens during global initialization rather than before first access:
The function is less than a third of the size without any initialization boilerplate at all.
Prior to C++11
Prior to C++11, however, this is generally not safe, unless your compiler makes some special guarantees about the way in which static locals are initialized.
Some time ago, while looking at a similar issue, I examined the assembly generated by Visual Studio for this case. The pseudocode for the generated assembly code for your
printmethod looked something like this:The
init_check_print_mtxis a compiler generated global variable specific to this method which tracks whether the local static has been initialized. Note that inside the "one time" initialize block guarded by this variable, that the variable is set to true before the mutex is initialized.I though this was silly since it ensures that other threads racing into this method will skip the initializer and use a uninitialized
mtx– versus the alternative of possibly initializingmtxmore than once – but in fact doing it this way allows you to avoid the infinite recursion issue that occurs ifstd::mutex()were to call back into print, and this behavior is in fact mandated by the standard.Nemo above mentions that this has been fixed (more precisely, re-specified) in C++11 to require a wait for all racing threads, which would make this safe, but you’ll need to check your own compiler for compliance. I didn’t check if in fact the new spec includes this guarantee, but I wouldn’t be at all surprised given that local statics were pretty much useless in multi-threaded environments without this (except perhaps for primitive values which didn’t have any check-and-set behavior because they just referred directly to an already initialized location in the .data segment).
1 Note that I changed the
print()function to a slightly simplerinc()function that just increments an integer in the locked region. This has the same locking structure and implications as the original, but avoids a bunch of code dealing with the<<operators andstd::cout.2 Using
c++filtthis de-mangles toguard variable for inc_local(int&)::mtx.