Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7905697
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T10:34:19+00:00 2026-06-03T10:34:19+00:00

I have the following two files :- single.cpp :- #include <iostream> #include <stdlib.h> using

  • 0

I have the following two files :-

single.cpp :-

#include <iostream>
#include <stdlib.h>

using namespace std;

unsigned long a=0;

class A {
  public:
    virtual int f() __attribute__ ((noinline)) { return a; } 
};

class B : public A {                                                                              
  public:                                                                                                                                                                        
    virtual int f() __attribute__ ((noinline)) { return a; }                                      
    void g() __attribute__ ((noinline)) { return; }                                               
};                                                                                                

int main() {                                                                                      
  cin>>a;                                                                                         
  A* obj;                                                                                         
  if (a>3)                                                                                        
    obj = new B();
  else
    obj = new A();                                                                                

  unsigned long result=0;                                                                         

  for (int i=0; i<65535; i++) {                                                                   
    for (int j=0; j<65535; j++) {                                                                 
      result+=obj->f();                                                                           
    }                                                                                             
  }                                                                                               

  cout<<result<<"\n";                                                                             
}

And

multiple.cpp :-

#include <iostream>
#include <stdlib.h>

using namespace std;

unsigned long a=0;

class A {
  public:
    virtual int f() __attribute__ ((noinline)) { return a; }
};

class dummy {
  public:
    virtual void g() __attribute__ ((noinline)) { return; }
};

class B : public A, public dummy {
  public:
    virtual int f() __attribute__ ((noinline)) { return a; }
    virtual void g() __attribute__ ((noinline)) { return; }
};


int main() {
  cin>>a;
  A* obj;
  if (a>3)
    obj = new B();
  else
    obj = new A();

  unsigned long result=0;

  for (int i=0; i<65535; i++) {
    for (int j=0; j<65535; j++) {
      result+=obj->f();
    }
  }

  cout<<result<<"\n";
}

I am using gcc version 3.4.6 with flags -O2

And this is the timings results I get :-

multiple :-

real    0m8.635s
user    0m8.608s
sys 0m0.003s

single :-

real    0m10.072s
user    0m10.045s
sys 0m0.001s

On the other hand, if in multiple.cpp I invert the order of class derivation thus :-

class B : public dummy, public A {

Then I get the following timings (which is slightly slower than that for single inheritance as one might expect thanks to ‘thunk’ adjustments to the this pointer that the code would need to do) :-

real    0m11.516s
user    0m11.479s
sys 0m0.002s

Any idea why this may be happening? There doesn’t seem to be any difference in the assembly generated for all three cases as far as the loop is concerned. Is there some other place that I need to look at?

Also, I have bound the process to a specific cpu core and I am running it on a real-time priority with SCHED_RR.

EDIT:- This was noticed by Mysticial and reproduced by me.
Doing a

cout << "vtable: " << *(void**)obj << endl;

just before the loop in single.cpp leads to single also being as fast as multiple clocking in at 8.4 s just like public A, public dummy.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T10:34:21+00:00Added an answer on June 3, 2026 at 10:34 am

    I think I got at least some further lead on why this may be happening. The assembly for the loops is exactly identical but the object files are not!

    For the loop with the cout at first (i.e.)

    cout << "vtable: " << *(void**)obj << endl;
    
    for (int i=0; i<65535; i++) {
      for (int j=0; j<65535; j++) {
        result+=obj->f();
      }
    }
    

    I get the following in the object file :-

    40092d:       bb fe ff 00 00          mov    $0xfffe,%ebx                                       
    400932:       48 8b 45 00             mov    0x0(%rbp),%rax                                     
    400936:       48 89 ef                mov    %rbp,%rdi                                          
    400939:       ff 10                   callq  *(%rax)                                            
    40093b:       48 98                   cltq                                                      
    40093d:       49 01 c4                add    %rax,%r12                                          
    400940:       ff cb                   dec    %ebx                                               
    400942:       79 ee                   jns    400932 <main+0x42>                                 
    400944:       41 ff c5                inc    %r13d                                              
    400947:       41 81 fd fe ff 00 00    cmp    $0xfffe,%r13d                                      
    40094e:       7e dd                   jle    40092d <main+0x3d>                                 
    

    However, without the cout, the loops become :- (.cpp first)

    for (int i=0; i<65535; i++) {
      for (int j=0; j<65535; j++) {
        result+=obj->f();
      }
    }
    

    Now, .obj :-

    400a54:       bb fe ff 00 00          mov    $0xfffe,%ebx
    400a59:       66                      data16                                                    
    400a5a:       66                      data16 
    400a5b:       66                      data16                                                    
    400a5c:       90                      nop                                                       
    400a5d:       66                      data16                                                    
    400a5e:       66                      data16                                                    
    400a5f:       90                      nop                                                       
    400a60:       48 8b 45 00             mov    0x0(%rbp),%rax                                     
    400a64:       48 89 ef                mov    %rbp,%rdi                                          
    400a67:       ff 10                   callq  *(%rax)
    400a69:       48 98                   cltq   
    400a6b:       49 01 c4                add    %rax,%r12                                          
    400a6e:       ff cb                   dec    %ebx                                               
    400a70:       79 ee                   jns    400a60 <main+0x70>                                 
    400a72:       41 ff c5                inc    %r13d                                              
    400a75:       41 81 fd fe ff 00 00    cmp    $0xfffe,%r13d
    400a7c:       7e d6                   jle    400a54 <main+0x64>                          
    

    So I’d have to say it’s not really due to false aliasing as Mysticial points out but simply due to these NOPs that the compiler/linker is emitting.

    The assembly in both cases is :-

    .L30:
            movl    $65534, %ebx
            .p2align 4,,7                   
    .L29:
            movq    (%rbp), %rax            
            movq    %rbp, %rdi
            call    *(%rax)
            cltq    
            addq    %rax, %r12                                                                        
            decl    %ebx
            jns     .L29
            incl    %r13d 
            cmpl    $65534, %r13d
            jle     .L30
    

    Now, .p2align 4,,7 will insert data/NOPs until the instruction counter for the next instruction has the last four bits 0’s for a maximum of 7 NOPs. Now the address of the instruction just after p2align in the case without cout and before padding would be

    0x400a59 = 0b101001011001
    

    And since it takes <=7 NOPs to align the next instruction, it will in fact do so in the object file.

    On the other hand, for the case with the cout, the instruction just after .p2align lands up at

    0x400932 = 0b100100110010
    

    and it would take > 7 NOPs to pad it to a divisible by 16 boundary. Hence, it doesn’t do that.

    So the extra time taken is simply due to the NOPs that the compiler pads the code with (for better cache alignment) when compiling with the -O2 flag and not really due to false aliasing.

    I think this resolves the issue. I am using http://sourceware.org/binutils/docs/as/P2align.html
    as my reference for what .p2align actually does.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Assume the following scenario. I have two files A.cpp and B.cpp in trunk. At
I have the two following text files: First one: chr10 1000 1001 DEL 2.4807
I have two Excel files contains the following structure, EmployeeAllDtl.xlsx id email name age
Consider the following situation: We have two Localizable.string files, one in en.lproj and one
Say I have two log files ( input.log and output.log ) with the following
Given I have two File objects I can think of the following implementation: public
I have the following code that fills dataTable1 and dataTable2 with two simple SQL
I have following two arrays. I want the difference between these two arrays. That
How to wrap one div around another? I have following two div ids: #course
I have the following two tables, affiliates and referrers. affiliates Table id loginid 3

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.