First of all, I know that it can be implemented with a mutex and condition variable, but I want the most efficient implementation possible.
I would like a semaphore with a fast-path when there’s no contention. On Linux this is easy with a futex; for example, here’s a wait:
if (AtomicDecremenIfPositive(_counter) > 0) return; // Uncontended
AtomicAdd(&_waiters, 1);
do
{
if (syscall(SYS_futex, &_counter, FUTEX_WAIT_PRIVATE, 0, nullptr, nullptr, 0) == -1) // Sleep
{
AtomicAdd(&_waiters, -1);
throw std::runtime_error("Failed to wait for futex");
}
}
while (AtomicDecrementIfPositive(_counter) <= 0);
AtomicAdd(&_waiters, -1);
and post:
AtomicAdd(&_counter, 1);
if (Load(_waiters) > 0 && syscall(SYS_futex, &_counter, FUTEX_WAKE_PRIVATE, 1, nullptr, nullptr, 0) == -1) throw std::runtime_error("Failed to wake futex"); // Wake one
At first I thought for Windows to just use NtWaitForKeyedEvent(). The problem is it’s not a direct substitution because it doesn’t atomically check the value at _counter before going into the kernel, and so can miss the wake from NtReleaseKeyedEvent(). Worse, then NtReleaseKeyedEvent() would block.
What’s the best solution?
I think something like this should work:
edit: that said, I’m not sure this would help you a lot. Your thread pool usually should be big enough that a thread is always ready to process your request. This means that not only waits, but also posts will always take the slow path and go to the kernel. So, counting semaphores are probably the one primitive where you do not really care about a userspace-only fastpath. Stock Win32 semaphores should be good enough. That said, I’m happy to be proven wrong!