Now that C++ is adding thread_local storage as a language feature, I’m wondering a few things:
- What is the cost of
thead_locallikely to be?- In memory?
- For read and write operations?
- Associated with that: how do Operating Systems usually implement this? It would seem like anything declared
thread_localwould have to be given thread-specific storage space for each thread created.
Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads.
There are two basic ways of implementing thread-local storage:
Using some sort of system call that gets information about the current kernel thread. Sloooow.
Using some pointer, probably in a processor register, that is set properly at every thread context switch by the kernel – at the same time as all the other registers. Cheap.
On intel platforms, variant 2 is usually implemented via some segment register (FS or GS, I don’t remember). Both GCC and MSVC support this. Access times are therefore about as fast as for global variables.
It is also possible, but I haven’t seen it yet in practice, for this to be implemented via existing library functions like
pthread_getspecific. Performance would then be like 1. or 2., plus library call overhead. Keep in mind that variant 2. + library call overhead is still a lot faster than a kernel call.