Updated: The actual resolution that the compile box which served my compile request was different. In the slower instance I was running code compiled on a SuSE 9 but running on a SuSE 10 box. That was sufficient difference for me to drop it and compare apples to apples. When using the same compile box the results were as follows:
g++ was about two percent slower
delta real 4 minutes delta user 4 mintues delta system 5 seconds
Thanks!
gcc v4.3 vs g++ v4.3 reduced to simplest case used nothing but simple flags
#include <stdio.h> #include <stdlib.h> int main (int argc, char **argv) { int i=0; int j=0; int k=0; int m=0; int n=0; for (i=0;i<1000;i++) for (j=0;j<6000;j++) for (k=0;k<12000;k++) { m = i+j+k; n=(m+1+1); } return 0; }
Is this a known issue? The 15% is very repro. and is across the board for real, system, and user time. I have to wait to post the assembly until tomorrow.
Update: I have only tried on one of my compile boxes. I am using SuSE 10.
When compiled with gcc and g++ the only difference I see is within the first 4 lines.
gcc:
g++:
as you can see the only difference is that with g++, the alignment (2) occurs on a word boundary. This tiny difference seems to be making the significant performance difference.
Here is a page explaining structure alignment, although it is for ARM/NetWinder it is still applicable as it discusses how alignment works on modern CPUs. You will want to read section 7 specifically ‘What are the disadvantages of word alignment?’ :
http://netwinder.osuosl.org/users/b/brianbr/public_html/alignment.html
and here is a reference on the .align operation:
http://www.nersc.gov/vendor_docs/ibm/asm/align.htm
Benchmarks as requested:
gcc:
g++:
I reduced the inner-most iteration to 1200. Results aren’t as widespread as I had hoped, but then again the assembly output was generated on windows, and the timings done in Linux. Maybe something different is done behind the scenes in MinGW than it is with gcc for Linux alignment-wise.