I’m writing some optimized C code that basically runs through an array and does something to each element. What it does depends on the current value of the element so something like:
for (i=0; i < a_len; i++) { if (a[i] == 0) { a[i] = f1(a[i]); } else if (a[i] % 2 == 0) { a[i] = f2(a[i]); } else { a[i] = 0; }
I’m returning to C after many years working in dynamic languages, where my practice has been to try to write straightforward code and not create lots of local variables for things that I can just refer to directly, like a[i] above. I am very much aware that best practices are to write readable code and trust that the compiler is smarter than you and will do good optimizations.
If I were writing the code above in assembler, I would load a[i] into a register once and then just use that value each time because I know that a[] is private memory and won’t change between references. However, even a smart compiler might do a load every time because it can’t be sure that the memory hasn’t changed. (Or do I have to explicitly declare ‘a’ volatile for the compiler to not make this optimization?).
So, my question is: should I expect better performance by rewriting with a local variable like so:
for (i=0; i < a_len; i++) { val = a[i]; if (val == 0) { a[i] = f1(val); } else if (val % 2 == 0) { a[i] = f2(val); } else { a[i] = 0; }
Or does stuff like -O3 take care of this automatically for me? The code I’m optimizing takes days to run, so even modest improvements will make a difference.
The functions
f1andf2seems to share the same signature. How differently do they behave? Do you really need the check outside? Or, can you embed the logic in one function?If you have a
if-elseladder instead of only two such functions, try to use an array of function pointers instead. Use the value ofa[ i ]to index in to that array and call the correct function.Hand-optimization often turns out to be error prone micro-optimization. It’s best to leave this task to the compiler. If you really need to optimize, look at the big picture, think of algorithms, the design, layers etc.
As for your question: Yes, most compilers are likely to optimize out the memory read should
a[ i ]be not declaredvolatile.