Often times when writing code, I find myself using a value from a particular function call multiple times. I realized that an obvious optimization would be to capture these repeatedly used values in variables.
This (pseudo code):
function add1(foo){ foo + 1; }
...
do_something(foo(1));
do_something_else(foo(1));
Becomes:
function add1(foo){ foo + 1; }
...
bar = foo(1);
do_something(bar);
do_something_else(bar);
However, doing this explicitly makes code less readable in my experience. I assumed that compilers could not do this kind of optimization if our language of choice allows functions to have side-effects.
Recently I looked into this, and if I understand correctly, this optimization is/can be done for languages where functions must be pure. That does not surprise me, but supposedly this can also be done for impure functions. With a few quick Google searches I found these snippets:
GCC 4.7 Fortran improvement
When performing front-end-optimization, the -faggressive-function-elimination option allows the removal of duplicate function calls even for impure functions.
Compiler Optimization (Wikipedia)
For example, in some languages functions are not permitted to have side effects. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can immediately infer that the function’s result need be computed only once. In languages where functions are allowed to have side effects, another strategy is possible. The optimizer can determine which function has no side effects, and restrict such optimizations to side effect free functions. This optimization is only possible when the optimizer has access to the called function.
From my understanding, this means that an optimizer can determine when a function is or is not pure, and perform this optimization when the function is. I say this because if a function always produces the same output when given the same input, and is side effect free, it would fulfill both conditions to be considered pure.
These two snippets raise two questions for me.
- How can a compiler be able to safely make this optimization if a function is not pure? (as in -faggressive-function-elimination)
- How can a compiler determine whether a function is pure or not? (as in the strategy suggested in the Wikipedia article)
and finally:
- Can this kind of optimization be applied to any language, or only when certain conditions are met?
- Is this optimization a worthwhile one even for extremely simple functions?
- How much overhead does storing and retrieving a value from the stack incur?
I apologize if these are stupid or illogical questions. They are just some things I have been curious about lately. 🙂
Disclaimer: I’m not a compiler/optimizer guy, I only have a tendency to peek at the generated code, and like to read about that stuff – so that’s not autorative. A quick search didn’t turn up much on -faggressive-function-elimination, so it might do some extra magic not explained here.
An optimizer can
Modifying your example a bit, and doing it in C++:
Resolves to (pseudocode)
(Note: reading from and writing to a volatile variable is an “observable side effect”, that the optimizer must preserve in the same order given by the code.)
Modifying the example for
footo have a side effect:generates the following pseudocode:
We observe that common subexpression elimination is still done, and separated from the side effects. Inlining and reordering allows to separate the side effects from the “pure” part.
Note that the compiler reads and eagerly writes back to
accu, which wouldn’t be necessary. I’m not sure on the rationale here.To conclude:
A compiler does not need to test for purity. It can identify side effects that need to be preserved, and then transform the rest to its liking.
Such optimizations are worthwhile, even for trivial functions, because, among others,
The overhead for a stack memory access is usually ~1 cycle, since the top of stack is usually in Level 1 cache already. Note that the usually should be in bold: it can be “even better”, since the read / write may be optimized away, or it can be worse since the increased pressure on L1 cache flushes some other important data back to L2.
Where’s the limit?
Theoretically, compile time. In practice, predictability and correctness of the optimizer are additional tradeoffs.
All tests with VC2008, default optimization settings for “Release” build.