This post may seem overly long for just the short question at the end of it. But I also need to describe a design pattern I just came up with. Maybe it’s commonly used, but I’ve never seen it (or maybe it just doesn’t work :).
First, here’s a code which (to my understanding) has undefined behavior due to “static initialization order fiasco”. The problem is that the initialization of Spanish::s_englishToSpanish is dependent on English::s_numberToStr, which are both static initialized and in different files, so the order of those initializations is undefined:
File: English.h
#pragma once
#include <vector>
#include <string>
using namespace std;
struct English {
static vector<string>* s_numberToStr;
string m_str;
explicit English(int number)
{
m_str = (*s_numberToStr)[number];
}
};
File: English.cpp
#include "English.h"
vector<string>* English::s_numberToStr = new vector<string>( /*split*/
[]() -> vector<string>
{
vector<string> numberToStr;
numberToStr.push_back("zero");
numberToStr.push_back("one");
numberToStr.push_back("two");
return numberToStr;
}());
File: Spanish.h
#pragma once
#include <map>
#include <string>
#include "English.h"
using namespace std;
typedef map<string, string> MapType;
struct Spanish {
static MapType* s_englishToSpanish;
string m_str;
explicit Spanish(const English& english)
{
m_str = (*s_englishToSpanish)[english.m_str];
}
};
File: Spanish.cpp
#include "Spanish.h"
MapType* Spanish::s_englishToSpanish = new MapType( /*split*/
[]() -> MapType
{
MapType englishToSpanish;
englishToSpanish[ English(0).m_str ] = "cero";
englishToSpanish[ English(1).m_str ] = "uno";
englishToSpanish[ English(2).m_str ] = "dos";
return englishToSpanish;
}());
File: StaticFiasco.h
#include <stdio.h>
#include <tchar.h>
#include <conio.h>
#include "Spanish.h"
int _tmain(int argc, _TCHAR* argv[])
{
_cprintf( Spanish(English(1)).m_str.c_str() ); // may print "uno" or crash
_getch();
return 0;
}
To solve the static initialization order problem, we use the construct-on-first-use idiom, and make those static initializations function-local like so:
File: English.h
#pragma once
#include <vector>
#include <string>
using namespace std;
struct English {
string m_str;
explicit English(int number)
{
static vector<string>* numberToStr = new vector<string>( /*split*/
[]() -> vector<string>
{
vector<string> numberToStr_;
numberToStr_.push_back("zero");
numberToStr_.push_back("one");
numberToStr_.push_back("two");
return numberToStr_;
}());
m_str = (*numberToStr)[number];
}
};
File: Spanish.h
#pragma once
#include <map>
#include <string>
#include "English.h"
using namespace std;
struct Spanish {
string m_str;
explicit Spanish(const English& english)
{
typedef map<string, string> MapT;
static MapT* englishToSpanish = new MapT( /*split*/
[]() -> MapT
{
MapT englishToSpanish_;
englishToSpanish_[ English(0).m_str ] = "cero";
englishToSpanish_[ English(1).m_str ] = "uno";
englishToSpanish_[ English(2).m_str ] = "dos";
return englishToSpanish_;
}());
m_str = (*englishToSpanish)[english.m_str];
}
};
But now we have another problem. Due to the function-local static data, neither of those classes is thread-safe. To solve this, we add to both classes a static member variable and an initialization function for it. Then inside this function we force the initialization of all the function-local static data, by calling once each function that has function-local static data. Thus, effectively we’re initializing everything at the start of program, but still controlling the order of initialization. So now our classes should be thread-safe:
File: English.h
#pragma once
#include <vector>
#include <string>
using namespace std;
struct English {
static bool s_areStaticsInitialized;
string m_str;
explicit English(int number)
{
static vector<string>* numberToStr = new vector<string>( /*split*/
[]() -> vector<string>
{
vector<string> numberToStr_;
numberToStr_.push_back("zero");
numberToStr_.push_back("one");
numberToStr_.push_back("two");
return numberToStr_;
}());
m_str = (*numberToStr)[number];
}
static bool initializeStatics()
{
// Call every member function that has local static data in it:
English english(0); // Could the compiler ignore this line?
return true;
}
};
bool English::s_areStaticsInitialized = initializeStatics();
File: Spanish.h
#pragma once
#include <map>
#include <string>
#include "English.h"
using namespace std;
struct Spanish {
static bool s_areStaticsInitialized;
string m_str;
explicit Spanish(const English& english)
{
typedef map<string, string> MapT;
static MapT* englishToSpanish = new MapT( /*split*/
[]() -> MapT
{
MapT englishToSpanish_;
englishToSpanish_[ English(0).m_str ] = "cero";
englishToSpanish_[ English(1).m_str ] = "uno";
englishToSpanish_[ English(2).m_str ] = "dos";
return englishToSpanish_;
}());
m_str = (*englishToSpanish)[english.m_str];
}
static bool initializeStatics()
{
// Call every member function that has local static data in it:
Spanish spanish( English(0) ); // Could the compiler ignore this line?
return true;
}
};
bool Spanish::s_areStaticsInitialized = initializeStatics();
And here’s the question: Is it possible that some compiler might optimize away those calls to functions (constructors in this case) which have local static data? So the question is what exactly amounts to “having side-effects”, which to my understanding means the compiler isn’t allowed to optimize it away. Is having function-local static data enough to make the compiler think the function call can’t be ignored?
Section 1.9 “Program execution” [intro.execution] of the C++11 standard says that
Also, in 3.7.2 “Automatic storage duration” [basic.stc.auto] it is said that
12.8-31 describes copy elision which I believe is irrelevant here.
So the question is whether the initialization of your local variables has side effects that prevent it from being optimized away. Since it can perform initialization of a static variable with an address of a dynamic object, I think it produces sufficient side effects (e.g. modifies an object). Also you can add there an operation with a volatile object, thus introducing an observable behavior which cannot be eliminated.