Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6682059
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T04:39:42+00:00 2026-05-26T04:39:42+00:00

While looking at some questions on optimization, this accepted answer for the question on

  • 0

While looking at some questions on optimization, this accepted answer for the question on coding practices for most effective use of the optimizer piqued my curiosity. The assertion is that local variables should be used for computations in a function, not output arguments. It was suggested this would allow the compiler to make additional optimizations otherwise not possible.

So, writing a simple bit of code for the example Foo class and compiling the code fragments with g++ v4.4 and -O2 gave some assembler output (use -S). The parts of the assembler listing with just the loop portion shown below. On examination of the output, it seems the loop is nearly identical for both, with just a difference in one address. That address being a pointer to the output argument for the first example or the local variable for the second.

There seems to no change in the actual effect whether the local variable is used or not. So the question breaks down to 3 parts:

a) is GCC not doing additional optimization, even given the hint suggested;

b) is GCC successfully optimizing in both cases, but should not be;

c) is GCC successfully optimizing in both cases, and is producing compliant output as defined by the C++ standard?

Here is the unoptimized function:

void DoSomething(const Foo& foo1, const Foo* foo2, int numFoo, Foo& barOut)
{
    for (int i=0; i<numFoo, i++)
    {
         barOut.munge(foo1, foo2[i]);
    }
}

And corresponding assembly:

.L3:
    movl    (%esi), %eax
    addl    $1, %ebx
    addl    $4, %esi
    movl    %eax, 8(%esp)
    movl    (%edi), %eax
    movl    %eax, 4(%esp)
    movl    20(%ebp), %eax       ; Note address is that of the output argument
    movl    %eax, (%esp)
    call    _ZN3Foo5mungeES_S_
    cmpl    %ebx, 16(%ebp)
    jg      .L3

Here is the re-written function:

void DoSomethingFaster(const Foo& foo1, const Foo* foo2, int numFoo, Foo& barOut)
{
    Foo barTemp = barOut;
    for (int i=0; i<numFoo, i++)
    {
         barTemp.munge(foo1, foo2[i]);
    }
    barOut = barTemp;
}

And here is the compiler output for the function using a local variable:

.L3:
    movl    (%esi), %eax          ; Load foo2[i] pointer into EAX
    addl    $1, %ebx              ; increment i
    addl    $4, %esi              ; increment foo2[i] (32-bit system, 8 on 64-bit systems)
    movl    %eax, 8(%esp)         ; PUSH foo2[i] onto stack (careful! from EAX, not ESI)
    movl    (%edi), %eax          ; Load foo1 pointer into EAX
    movl    %eax, 4(%esp)         ; PUSH foo1
    leal    -28(%ebp), %eax       ; Load barTemp pointer into EAX
    movl    %eax, (%esp)          ; PUSH the this pointer for barTemp
    call    _ZN3Foo5mungeES_S_    ; munge()!
    cmpl    %ebx, 16(%ebp)        ; i < numFoo
    jg      .L3                   ; recall incrementing i by one coming into the loop
                                  ; so test if greater
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T04:39:43+00:00Added an answer on May 26, 2026 at 4:39 am

    The example given in that answer was not a very good one because of the call to an unknown function the compiler cannot reason much about. Here’s a better example:

    void FillOneA(int *array, int length, int& startIndex)
    {
        for (int i = 0; i < length; i++) array[startIndex + i] = 1;
    }
    
    void FillOneB(int *array, int length, int& startIndex)
    {
        int localIndex = startIndex;
        for (int i = 0; i < length; i++) array[localIndex + i] = 1;
    }
    

    The first version optimizes poorly because it needs to protect against the possibility that somebody called it as

    int array[10] = { 0 };
    FillOneA(array, 5, array[1]);
    

    resulting in {1, 1, 0, 1, 1, 1, 0, 0, 0, 0 } since the iteration with i=1 modifies the startIndex parameter.

    The second one doesn’t need to worry about the possibility that the array[localIndex + i] = 1 will modify localIndex because localIndex is a local variable whose address has never been taken.

    In assembly (Intel notation, because that’s what I use):

    FillOneA:
        mov     edx, [esp+8]
        xor     eax, eax
        test    edx, edx
        jle     $b
        push    esi
        mov     esi, [esp+16]
        push    edi
        mov     edi, [esp+12]
    $a: mov     ecx, [esi]
        add     ecx, eax
        inc     eax
        mov     [edi+ecx*4], 1
        cmp     eax, edx
        jl      $a
        pop     edi
        pop     esi
    $b: ret
    
    FillOneB:
        mov     ecx, [esp+8]
        mov     eax, [esp+12]
        mov     edx, [eax]
        test    ecx, ecx
        jle     $a
        mov     eax, [esp+4]
        push    edi
        lea     edi, [eax+edx*4]
        mov     eax, 1
        rep stosd
        pop     edi
    $a: ret
    

    ADDED: Here’s an example where the compiler’s insight is into Bar, and not munge:

    class Bar
    {
    public:
        float getValue() const
        {
            return valueBase * boost;
        }
    
    private:
        float valueBase;
        float boost;
    };
    
    class Foo
    {
    public:
        void munge(float adjustment);
    };
    
    void Adjust10A(Foo& foo, const Bar& bar)
    {
        for (int i = 0; i < 10; i++)
            foo.munge(bar.getValue());
    }
    
    void Adjust10B(Foo& foo, const Bar& bar)
    {
        Bar localBar = bar;
        for (int i = 0; i < 10; i++)
            foo.munge(localBar.getValue());
    }
    

    The resulting code is

    Adjust10A:
        push    ecx
        push    ebx
        mov     ebx, [esp+12] ;; foo
        push    esi
        mov     esi, [esp+20] ;; bar
        push    edi
        mov     edi, 10
    $a: fld     [esi+4] ;; bar.valueBase
        push    ecx
        fmul    [esi] ;; valueBase * boost
        mov     ecx, ebx
        fstp    [esp+16]
        fld     [esp+16]
        fstp    [esp]
        call    Foo::munge
        dec     edi
        jne     $a
        pop     edi
        pop     esi
        pop     ebx
        pop     ecx
        ret     0
    
    Adjust10B:
        sub     esp, 8
        mov     ecx, [esp+16] ;; bar
        mov     eax, [ecx] ;; bar.valueBase
        mov     [esp], eax ;; localBar.valueBase
        fld     [esp] ;; localBar.valueBase
        mov     eax, [ecx+4] ;; bar.boost
        mov     [esp+4], eax ;; localBar.boost
        fmul    [esp+4] ;; localBar.getValue()
        push    esi
        push    edi
        mov     edi, [esp+20] ;; foo
        fstp    [esp+24]
        fld     [esp+24] ;; cache localBar.getValue()
        mov     esi, 10 ;; loop counter
    $a: push    ecx
        mov     ecx, edi ;; foo
        fstp    [esp] ;; use cached value
        call    Foo::munge
        fld     [esp]
        dec     esi
        jne     $a ;; loop
        pop     edi
        fstp    ST(0)
        pop     esi
        add     esp, 8
        ret     0
    

    Observe that the inner loop in Adjust10A must recalculate the value since it must protect against the possibility that foo.munge changed bar.

    That said, this style of optimization is not a slam dunk. (For example, we could’ve gotten the same effect by manually caching bar.getValue() into localValue.) It tends to be most helpful for vectorized operations, since those can be paralellized.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

While looking up the answer to this question: Why is an out parameter not
While l was looking over some questions about MEF, I stumbled onto this particular
While looking at a micro-optimization question that I asked yesterday ( here ), I
I am looking for some guideline for my new application while choosing ORM. I
I haven't done any swing programming in a while, so I'm looking for some
While there have been some similar questions I couldn't find a solution for exactly
Update 25/03/2011 I've marked this question as answered, while I don't yet have the
While looking at online code samples, I have sometimes come across an assignment of
While looking for a light-weight Scala development environment, I came upon an Scala edit
While looking for an SFTP client in C# SSH File Transfer Protocol (SFTP), I've

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.