I have:
struct DoubleVec {
std::vector<double> data;
};
DoubleVec operator+(const DoubleVec& lhs, const DoubleVec& rhs) {
DoubleVec ans(lhs.size());
for(int i = 0; i < lhs.size(); ++i) {
ans[i] = lhs[i]] + rhs[i]; // Assume lhs.size() == rhs.size()
}
return ans;
}
DoubleVec someFunc(DoubleVec a, DoubleVec b, DoubleVec c, DoubleVec d) {
DoubleVec ans = a + b + c + d;
}
Now, in the above, the "a + b + c + d" will cause the creation of three temporary DoubleVec’s. Is there a way to optimize this away with some type of template magic? I.e., to optimize it down to something equivalent to:
DoubleVec ans(a.size());
for(int i = 0; i < ans.size(); i++)
ans[i] = a[i] + b[i] + c[i] + d[i];
You can assume all DoubleVec’s have the same number of elements.
The high level idea is to have do some type of templated magic on "+", which "delays the computation" until the =, at which point it looks into itself, goes hmm … I’m just adding these numbers, and synthesizes a[i] + b[i] + c[i] + d[i] … instead of all the temporary variables.
Yep, that’s exactly what expression templates (see http://www.drdobbs.com/184401627 or http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template for example) are for.
The idea is to make
operator+return some kind of proxy object which represents the expression tree to be evaluated. Thenoperator=is written to take such an expression tree and evaluate it all at once, avoiding the creation of temporaries, and applying any other optimizations that may be applicable.