Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6914769
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T09:23:25+00:00 2026-05-27T09:23:25+00:00

Consider this code under gcc 4.5.1 (Ubuntu 10.04, intel core2duo 3.0 Ghz) It’s just

  • 0

Consider this code under gcc 4.5.1 (Ubuntu 10.04, intel core2duo 3.0 Ghz)
It’s just 2 tests, in the first one I make a direct call on virtual fucnion and in the second I call it via a Wrapper class :

test.cpp

#define ITER 100000000

class Print{

public:

typedef Print* Ptr;

virtual void print(int p1, float p2, float p3, float p4){/*DOES NOTHING */}

};

class PrintWrapper
{

    public:

      typedef PrintWrapper* Ptr;

      PrintWrapper(Print::Ptr print, int p1, float p2, float p3, float p4) :
      m_print(print), _p1(p1),_p2(p2),_p3(p3),_p4(p4){}

      ~PrintWrapper(){}

      void execute()
      { 
        m_print->print(_p1,_p2,_p3,_p4); 
      }

    private:

      Print::Ptr m_print;
      int _p1;
      float _p2,_p3,_p4;

};

 Print::Ptr p = new Print();
 PrintWrapper::Ptr pw = new PrintWrapper(p, 1, 2.f,3.0f,4.0f);

void test1()
{

 //-------------test 1-------------------------

 for (auto var = 0; var < ITER; ++var) 
 {
   p->print(1, 2.f,3.0f,4.0f);
 }

 }

 void test2()
 {

  //-------------test 2-------------------------

 for (auto var = 0; var < ITER; ++var) 
 {
   pw->execute();
 }

}

int main() 
{ 
  test1(); 
  test2();
}

I profiled it with gprof and objdump :

g++ -c -std=c++0x -pg -g -O2 test.cpp
objdump -d -M intel -S test.o > objdump.txt
g++ -pg test.o -o test
./test
gprof test > gprof.output

in gprof.output I observed that test2() takes more time than test1() but I can’t explain it

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 49.40      0.41     0.41        1   410.00   540.00  test2()
 31.33      0.67     0.26 200000000     0.00     0.00  Print::print(int, float, float, float)
 19.28      0.83     0.16        1   160.00   290.00  test1()
  0.00      0.83     0.00        1     0.00     0.00  global constructors keyed to p

The assembly code in objdump.txt doesn’t help me either :

 //-------------test 1-------------------------
 for (auto var = 0; var < ITER; ++var) 
  15:   83 c3 01                add    ebx,0x1
 {
   p->print(1, 2.f,3.0f,4.0f);
  18:   8b 10                   mov    edx,DWORD PTR [eax]
  1a:   c7 44 24 10 00 00 80    mov    DWORD PTR [esp+0x10],0x40800000
  21:   40 
  22:   c7 44 24 0c 00 00 40    mov    DWORD PTR [esp+0xc],0x40400000
  29:   40 
  2a:   c7 44 24 08 00 00 00    mov    DWORD PTR [esp+0x8],0x40000000
  31:   40 
  32:   c7 44 24 04 01 00 00    mov    DWORD PTR [esp+0x4],0x1
  39:   00 
  3a:   89 04 24                mov    DWORD PTR [esp],eax
  3d:   ff 12                   call   DWORD PTR [edx]

  //-------------test 2-------------------------
 for (auto var = 0; var < ITER; ++var) 
  65:   83 c3 01                add    ebx,0x1

      ~PrintWrapper(){}

      void execute()
      { 
        m_print->print(_p1,_p2,_p3,_p4); 
  68:   8b 10                   mov    edx,DWORD PTR [eax]
  6a:   8b 70 10                mov    esi,DWORD PTR [eax+0x10]
  6d:   8b 0a                   mov    ecx,DWORD PTR [edx]
  6f:   89 74 24 10             mov    DWORD PTR [esp+0x10],esi
  73:   8b 70 0c                mov    esi,DWORD PTR [eax+0xc]
  76:   89 74 24 0c             mov    DWORD PTR [esp+0xc],esi
  7a:   8b 70 08                mov    esi,DWORD PTR [eax+0x8]
  7d:   89 74 24 08             mov    DWORD PTR [esp+0x8],esi
  81:   8b 40 04                mov    eax,DWORD PTR [eax+0x4]
  84:   89 14 24                mov    DWORD PTR [esp],edx
  87:   89 44 24 04             mov    DWORD PTR [esp+0x4],eax
  8b:   ff 11                   call   DWORD PTR [ecx]

How can we explain such a difference ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T09:23:26+00:00Added an answer on May 27, 2026 at 9:23 am

    In test2(), the program must first load pw from the heap, then call pw->execute() (which incurs call overhead), then load pw->m_print as well as the _p1 through _p4 arguments, then load the vtable pointer for pw, then load the vtable slot for pw->Print, then call pw->Print. Because the compiler can’t see through the virtual call, it then must assume all of these values have changed for the next iteration, and reload them all.

    In test(), the arguments are inline in the code segment, and we only have to load p, the vtable pointer, and the vtable slot. We’ve saved five loads this way. This could easily account for the time difference.

    In short – the loads of pw->m_print and pw->_p1 through pw->_p4 are the culprit here.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Consider this code... using System.Threading; //... Timer someWork = new Timer( delegate(object state) {
consider this code block public void ManageInstalledComponentsUpdate() { IUpdateView view = new UpdaterForm(); BackgroundWorker
Consider this code (Java, specifically): public int doSomething() { doA(); try { doB(); }
Consider this code: new Ajax.Request('?service=example', { parameters : {tags : 'exceptions'}, onSuccess : this.dataReceived.bind(this)
Consider this code: public <T> List<T> meth(List<?> type) { System.out.println(type); // 1 return new
Consider this code: import socket store = [] scount = 0 while True: scount+=1
Consider this code: int main() { int e; prn(e); return 0; } void prn(double
Consider this code: void res(int a,int n) { printf(%d %d, ,a,n); } void main(void)
Consider this code: int x = 17; int y = 013; System.out.println(x+y = +
Consider this code : class MyClass<T> { } class AnotherClass : MyClass<String> { }

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.