Here is a very simplified program which reproduces the issue I faced in the real application:
#include "stdlib.h"
typedef unsigned short int shft;
struct xptr {
int layer;
void * addr;
xptr() : layer(0), addr(NULL) {}
xptr(int _layer, void *_addr) : layer(_layer), addr(_addr) {}
xptr(const xptr& x) { layer = x.layer; addr = x.addr; }
/* Uncomment this to remove the bug */
/* const xptr& operator= (const xptr& rhs)
{
if (this != &rhs) {
this->addr = rhs.addr;
this->layer = rhs.layer;
}
return *this;
}*/
};
struct n_dsc {
xptr pdsc;
xptr ldsc;
xptr rdsc;
};
void dm_string_value_traverse(xptr node)
{
xptr p = node;
while (p.addr != 0)
{
p = ((n_dsc*)p.addr)->rdsc;
}
}
int main()
{
n_dsc n1, n2;
n1.rdsc = xptr(0, &n2);
xptr n = xptr(0, &n1);
dm_string_value_traverse(n);
}
I absolutely could not understand why CL 2008 (with /Og /Zi /W4) generates the following assembly for the “dm_string_value_traverse” function:
while (p.addr != 0)
004ABAC5 mov eax,dword ptr [esp+10h]
004ABAC9 add esp,0Ch
004ABACC test eax,eax
004ABACE je dm_string_value_traverse+5Fh (4ABADFh)
{
p = ((n_dsc*)p.addr)->rdsc;
004ABAD0 mov ecx,dword ptr [eax+1Ch]
004ABAD3 mov dword ptr [esp],ecx
004ABAD6 mov eax,dword ptr [eax+20h]
004ABAD9 mov dword ptr [esp+4],eax
004ABADD jmp dm_string_value_traverse+50h (4ABAD0h) ;NOTE THIS JMP TO THE ASSIGNMENT
}
}
Note:
- Condition (p.addr != 0) is checked only once! See 004ABADD jumps unconditionally to the 004ABAD0 (assignment).
- Without /Og compiler generates right code.
- If you uncomment copy constructor compiler will generate good code also.
Is it possible to understand why this is going on? Is there a way to workaround this?
This looks like an overzealous optimizer. I can confirm the behavior you describe with Visual Studio 2008 SP1, /Og, and /Fa for assembly output. This wouldn’t be the first time for VC: try google visual c++ “/Og” site:support.microsoft.com.
One workaround is to iterate with an xptr pointer, instead of an xptr value. This also has the beneficial side-effect of reducing the number of bytes copied on each iteration from 8 (xptr value) to 4 (xptr pointer).
The resulting assembly code, with /Og, now looks like this. I can’t map the assembly exactly to the source code, as the comparison
(p->addr != 0)now happens in two places. It is clear, however, that the loop now includes a test for its end condition.Given how intractable this class of bug is, there may be similar issues lurking invisible in the code. For the portion of the code that deals with xptr’s and n_dsc’s, you might want to consider going without /Og, or unit-testing it all around.