I’m running into an inconsistent optimization behavior with different compilers for the following code:
class tester
{
public:
tester(int* arr_, int sz_)
: arr(arr_), sz(sz_)
{}
int doadd()
{
sm = 0;
for (int n = 0; n < 1000; ++n)
{
for (int i = 0; i < sz; ++i)
{
sm += arr[i];
}
}
return sm;
}
protected:
int* arr;
int sz;
int sm;
};
The doadd function simulates some intensive access to members (ignore the overflows in addition for this question). Compared with similar code implemented as a function:
int arradd(int* arr, int sz)
{
int sm = 0;
for (int n = 0; n < 1000; ++n)
{
for (int i = 0; i < sz; ++i)
{
sm += arr[i];
}
}
return sm;
}
The doadd method runs about 1.5 times slower than the arradd function when compiled in Release mode with Visual C++ 2008. When I modify the doadd method to be as follows (aliasing all members with locals):
int doadd()
{
int mysm = 0;
int* myarr = arr;
int mysz = sz;
for (int n = 0; n < 1000; ++n)
{
for (int i = 0; i < mysz; ++i)
{
mysm += myarr[i];
}
}
sm = mysm;
return sm;
}
Runtimes become roughly the same. Am I right in concluding that this is a missing optimization by the Visual C++ compiler? g++ seems to do it better and run both the member function and the normal function at the same speed when compiling with -O2 or -O3.
The benchmarking is done by invoking the doadd member and arradd function on some sufficiently large array (a few millions of integers in size).
EDIT: Some fine-grained testing shows that the main culprit is the sm member. Replacing all others by local versions still makes the runtime long, but once I replace sm by mysm the runtime becomes equal to the function version.

Resolution
Dissapointed with the answers (sorry guys), I shaked off my laziness and dove into the disassembly listings for this code. My answer below summarizes the findings. In short: it has nothing to do with aliasing, it has all to do with loop unrolling, and with some strange heuristics MSVC applies when deciding which loop to unroll.
I disassembled the code with MSVC to better understand what’s going on. Turns out aliasing wasn’t a problem at all, and neither was some kind of paranoid thread safety.
Here is the interesting part of the
arraddfunction disassambled:ecxpoints to the array, and we can see that the internal loop is unrolled x4 here – note the four consecutiveaddinstructions from following addresses, andecxbeing advanced by 16 bytes (4 words) at a time inside the loop.For the unoptimized version of the member function,
doadd:The disassembly is (it’s harder to find since the compiler inlined it into
main):Note 2 things:
edi. Hence, there’s not aliasing “care” taken here. The value ofsmisn’t re-read all the time.ediisinitialized just once and then used as a temporary. You don’t see its return since the compiler optimized it and usededidirectly as the return value of the inlined code.Finally, here’s an “optimized” version of the member function, with
mysmkeeping the sum local manually:The (again, inlined) disassembly is:
The loop here is unrolled, but just x2.
This explains my speed-difference observations quite well. For a 175e6 array, the function runs ~1.2 secs, the unoptimized member ~1.5 secs, and the optimized member ~1.3 secs. (Note that this may differ for you, on another machine I got closer runtimes for all 3 versions).
What about gcc? When compiled with it, all 3 versions ran at ~1.5 secs. Suspecting the lack of unrolling I looked at
gcc‘s disassembly and indeed: gcc doesn’t unroll any of the versions.