I just asked a question involving volatile: volatile array c++
However my question spawned a discussion on what volatile does.
Some claim that when using the CreateThread(), you don’t have to worry about volatiles.
Microsoft on the other hand gives an example of volatile when using two threads created by CreateThread().
I created the following sample in visual c++ express 2010, and it doesn’t matter if you mark done as volatile or not
#include "targetver.h"
#include <Windows.h>
#include <stdio.h>
#include <iostream>
#include <tchar.h>
using namespace std;
bool done = false;
DWORD WINAPI thread1(LPVOID args)
{
while(!done)
{
}
cout << "Thread 1 done!\n";
return 0;
}
DWORD WINAPI thread2(LPVOID args)
{
Sleep(1000);
done = 1;
cout << "Thread 2 done!\n";
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
DWORD thread1Id;
HANDLE hThread1;
DWORD thread2Id;
HANDLE hThread2;
hThread1 = CreateThread(NULL, 0, thread1, NULL, 0, &thread1Id);
hThread2 = CreateThread(NULL, 0, thread2, NULL, 0, &thread2Id);
Sleep(4000);
CloseHandle(hThread1);
CloseHandle(hThread2);
return 0;
}
Can you ALWAYS be sure that thread 1 will stop if done is not volatile?
What
volatiledoes:What
volatiledoes not:Some non-portable behaviors that shouldn’t be relied on in cross-platform C++:
volatileto prevent any reordering with other instructions. Other compilers don’t, because it negatively affects optimization.Most of the time, what people really want are fences (also called barriers) and atomic instructions, which are usable if you’ve got a C++11 compiler, or via compiler- and architecture-dependent functions otherwise.
Fences ensure that, at the point of use, all the previous reads/writes will be completed. In C++11, fences are controlled at various points using the
std::memory_orderenumeration. In VC++ you can use_ReadBarrier(),_WriteBarrier(), and_ReadWriteBarrier()to do this. I’m not sure about other compilers.On some architectures like x86, a fence is merely a way to prevent the compiler from reordering instructions. On others they might actually emit an instruction to prevent the CPU itself from reordering things.
Here’s an example of improper use:
Here,
finishedis allowed to be reordered to before eitherresis set! Well, volatile prevents reordering with other volatile, right? Let’s try making eachresvolatile too:This trivial example will actually work on x86, but it is going to be inefficient. For one, this forces
res1to be set beforeres2, even though we don’t really care about that… we just want both of them set beforefinishedis. Forcing this ordering betweenres1andres2will only prevent valid optimizations, eating away at performance.For more complex problems, you’ll have to make every write
volatile. This would bloat your code, be very error prone, and become slow as it prevents a lot more reordering than you really wanted.It’s not realistic. So we use fences and atomics. They allow full optimization, and only guarantee that the memory access will complete at the point of the fence:
This will work for all architectures.
res1andres2operations can be reordered as the compiler sees fit. Performing an atomic release ensures that all non-atomic ops are ordered to complete and be visible to threads which perform an atomic acquire.