In the book Game Coding Complete, 3rd Edition, the author mentions a technique to both reduce data structure size and increase access performance. In essence it relies on the fact that you gain performance when member variables are memory aligned. This is an obvious potential optimization that compilers would take advantage of, but by making sure each variable is aligned they end up bloating the size of the data structure.
Or that was his claim at least.
The real performance increase, he states, is by using your brain and ensuring that your structure is properly designed to take take advantage of speed increases while preventing the compiler bloat. He provides the following code snippet:
#pragma pack( push, 1 )
struct SlowStruct
{
char c;
__int64 a;
int b;
char d;
};
struct FastStruct
{
__int64 a;
int b;
char c;
char d;
char unused[ 2 ]; // fill to 8-byte boundary for array use
};
#pragma pack( pop )
Using the above struct objects in an unspecified test he reports a performance increase of 15.6% (222ms compared to 192ms) and a smaller size for the FastStruct. This all makes sense on paper to me, but it fails to hold up under my testing:

Same time results and size (counting for the char unused[ 2 ])!
Now if the #pragma pack( push, 1 ) is isolated only to FastStruct (or removed completely) we do see a difference:

So, finally, here lies the question: Do modern compilers (VS2010 specifically) already optimize for the bit alignment, hence the lack of performance increase (but increase the structure size as a side-affect, like Mike Mcshaffry stated)? Or is my test not intensive enough/inconclusive to return any significant results?
For the tests I did a variety of tasks from math operations, column-major multi-dimensional array traversing/checking, matrix operations, etc. on the unaligned __int64 member. None of which produced different results for either structure.
In the end, even if their was no performance increase, this is still a useful tidbit to keep in mind for keeping memory usage to a minimum. But I would love it if there was a performance boost (no matter how minor) that I am just not seeing.
It is highly dependent on the hardware.
Let me demonstrate:
Core i7 920 @ 3.5 GHz
Okay, not much difference. But it’s still consistent over multiple runs.
So the alignment makes a small difference on Nehalem Core i7.
Intel Xeon X5482 Harpertown @ 3.2 GHz (Core 2 – generation Xeon)
Now take a look…
6.2x faster!!!
Conclusion:
You see the results. You decide whether or not it’s worth your time to do these optimizations.
EDIT :
Same benchmarks but without the
#pragma pack:Core i7 920 @ 3.5 GHz
Intel Xeon X5482 Harpertown @ 3.2 GHz
misalignment without trouble for this benchmark.
Taken from my comment:
If you leave out the
#pragma pack, the compiler will keep everything aligned so you don’t see this issue. So this is actually an example of what could happen if you misuse#pragma pack.