Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8747873
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T12:25:46+00:00 2026-06-13T12:25:46+00:00

As was advised long time ago, I always build my release executables without frame

  • 0

As was advised long time ago, I always build my release executables without frame pointers (which is the default if you compile with /Ox).

However, now I read in the paper http://research.microsoft.com/apps/pubs/default.aspx?id=81176, that frame pointers don’t have much of an effect on performance. So optimizing it fully (using /Ox) or optimizing it fully with frame pointers (using /Ox /Oy-) doesn’t really make a difference on peformance.

Microsoft seems to indicate that adding frame pointers (/Oy-) makes debugging easier, but is this really the case?

I did some experiments and noticed that:

  • in a simple 32-bit test executable (compiled using /Ox /Ob0) the omission of frame pointers does increase performance (with about 10%). But this test executable only performs some function calls, nothing else.
  • in my own application the adding/removing of frame pointers don’t seem to have a big effect. Adding frame pointers seems to make the application about 5% faster, but that could be within the error margin.

What is the general advice regarding frame pointers?

  • should they be omitted (/Ox) in a release executable because they really have a positive effect on performance?
  • should they be added (/Ox /Oy-) in a release executable because they improve debug-ablity (when debugging with a crash-dump file)?

Using Visual Studio 2010.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T12:25:47+00:00Added an answer on June 13, 2026 at 12:25 pm

    Phoronix tested the performance downside of -O2 -fno-omit-frame-pointer with x86-64 GCC 12.1 on a Zen 3 laptop CPU for multiple open-source programs, as proposed for Fedora 37. Most of them had performance regressions, a few of them very serious, although the biggest ones are probably some kind of fluke or other interaction. Geometric mean slowdown of 14% (including those possible outliers).


    Short answer: By omitting the frame pointer,

    You need to use the stack pointer to access local variables and arguments. The compiler doesn’t mind, but if you are coding in assembler, this makes your life slightly harder. Much harder if you don’t use macros.

    You save four bytes (32-bit architecture) of stack space per function call. Unless you are using deep recursion, this isn’t a win.

    You save a memory write to a cached memory (the stack) and you (theoretically) save a few clock ticks on function entry/exit, but you can increase the code size. Unless your function is doing very little very often (in which case it should be inlined), this shouldn’t be noticeable.

    You free up a general-purpose register. If the compiler can utilize the register, it will produce code that is both substantially smaller and potentially faster. But, if most of the CPU time is spent talking to the main memory (or even the hard drive), omitting the frame pointer is not going save you from that.

    The debugger will lose an easy way to generate the stack trace. The debugger might still be able to able to generate the stack trace from a different source (such as a PDB file).


    Long answer:

    The typical function entry and exit is (16-bit processor):

    PUSH BP   ;push the base pointer (frame pointer)
    MOV BP,SP ;store the stack pointer in the frame pointer
    SUB SP,xx ;allocate space for local variables et al.
    ...
    LEAVE     ;restore the stack pointer and pop the old frame pointer
    RET       ;return from the function
    

    An entry and exit without a frame pointer could look like (32-bit processor):

    SUB ESP,xx ;allocate space for local variables et al.
    ...
    ADD ESP,xx ;de-allocate space for local variables et al.
    RET        ;return from the function.
    

    You will save two instructions, but you also duplicate a literal value, so the code doesn’t get shorter (quite the opposite, especially with [esp+xx] addressing modes taking an extra byte vs. [ebp+xx]), but you might have saved a few clock cycles (or not, if it causes a cache miss in the instruction cache). You did save some space on the stack, though.


    You do free up a general-purpose register. This has only benefits.

    In regcall/fastcall, this is one extra register where you can store arguments to your function. Thus, if your function takes seven (on x86; more on most other architectures) or more arguments (including this), the seventh argument still fits into a register. (Although most calling conventions don’t pass that many in registers, e.g., two for MS fastcall, three for GCC regparm(3) on 32-bit x86. Up to six integer register arguments on x86-64 System V, or 4 register arguments on most RISC processors.)

    The same, more importantly, applies to local variables as well. Arrays and large objects don’t fit into registers (but pointers to them do), but if your function is using seven different local variables (including temporary variables needed to calculate complex expressions), chances are the compiler will be able to produce smaller code. Smaller code means lower instruction cache footprint, which means reduced miss rate and thus even less memory access (but Intel Atom has a 32K instruction cache, meaning that your code will probably fit anyway).

    The x86 architecture features the [BX/BP/SI/DI] and [BX/BP + SI/DI] addressing modes. This makes the BP register an extremely useful place for a scaled array index, especially if the array pointer resides in the SI or DI registers. Two offset registers are better than one.

    Utilising a register avoids memory access, but if a variable is worth storing in a register, chances are it will survive just as fine in an L1 cache (especially since it’s going to be on the stack). There is still the cost of moving to/from the cache, but since modern CPUs do a lot move optimisation and parallelisation, it is possible that an L1 access would be just as fast as a register access. Thus, the speed benefit from not moving data around is still present, but not as enormous. I can easily imagine the CPU avoiding the data cache completely, at least as far as reading is concerned (and writing to cache can be done in parallel).

    A register that is utilised is a register that needs preserving. It is not worth storing much in the registers if you are going to push it to the stack anyway before you use it again. In preserve-by-caller calling conventions (such as the one above), this means that registers as persistent storage are not as useful in a function that calls other functions a lot.

    See What are callee and caller saved registers? for more about how calling conventions are designed with a mix of call-clobbered and call-preserved registers to give compilers a good mix of each, so functions have some scratch registers for temporaries that don’t need to live across function calls, but also some registers that callees will preserve. Also Why make some registers caller-saved and others callee-saved? Why not make the caller save everything it wants saved?

    Also note that x86 has a separate register space for floating point registers, meaning that floats cannot utilise the BP register without extra data movement instructions anyway. Only integers and memory pointers do.


    You do lose debugability by omitting frame pointers. This answer show why:

    If the code crashes, all the debugger needs to do to generate the stack trace is:

        PUSH BP      ; log the current frame pointer as well
    $1: CALL log_BP  ; log the frame pointer currently on stack
        LEAVE        ; pop the frame pointer to get the next one
        CMP [BP+4],0
        JNZ $1       ; until the stack cannot be popped (the return address is some specific value)
    

    If the code crashes without a frame pointer, the debugger might not have any way to generate the stack trace, because it might not know (namely, it needs to locate the function entry/exit point) how much needs to be subtracted from the stack pointer. If the debugger doesn’t know the frame pointer is not being used, it might even crash itself.

    Modern debug-info formats have metadata that still allows stack backtraces in optimized code where the compiler defaults to not using [E/R]BP as a frame pointer. Compilers know how to use assembler directives to create this extra metadata, or write it directly in the object file, not in the parts that normally get mapped into memory. If you don’t do this for hand-written assembly, then debugability would suffer, especially for crashes in functions called by a hand-written assembly function.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I use Delphi/NexusDB and I build SQL (about 800 char long) at run time
Please consider the following: we have a Spring bean B which is advised with
i have been advised to use Executors.newCachedThreadPool() which will be able to solve problems
For quite a long time I've wanted to start a pet project that will
I was advised in a previous question that I didn't need to maintain the
I know it's not advised to go more then 1 level deep in nested
My understanding is that it's advised testers are separate from developers, i.e you obviously
When using implicit waits, as advised here , I still sometimes want to assert
As a newbie, I have been advised to preferably use heredoc compared to too
In GWT javadoc, we are advised If you only need a simple label (text,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.