I have created this code:
#include <stdio.h>
typedef unsigned int uint;
uint in[2]={1,2},out[2]={3,4};
int main() {
in[0]=out[0]/10;
}
and compiled it with GCC (v4.4.5,no optimizations) on Linux, the resulting assembly is:
0000000000400474 <main>:
400474: 55 push rbp
400475: 48 89 e5 mov rbp,rsp
400478: 8b 05 ae 03 20 00 mov eax,DWORD PTR [rip+0x2003ae] # 0082c <out>
40047e: 89 45 fc mov DWORD PTR [rbp-0x4],eax
400481: ba cd cc cc cc mov edx,0xcccccccd
400486: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
400489: f7 e2 mul edx
40048b: 89 d0 mov eax,edx
40048d: c1 e8 03 shr eax,0x3
400490: 89 05 8e 03 20 00 mov DWORD PTR [rip+0x20038e],eax # 600824 <in>
400496: c9 leave
400497: c3 ret
400498: 90 nop
400499: 90 nop
40049a: 90 nop
40049b: 90 nop
40049c: 90 nop
40049d: 90 nop
40049e: 90 nop
40049f: 90 nop
Now, the question is: what is this code doing on line #5 ?
40047e: 89 45 fc mov DWORD PTR [rbp-0x4],eax
isn’t it storing the value it got from out[0] again in some place in memory? Why so? I didn’t tell it to read and write immediatly to some location.
Now, this temporal variable appears again at the address 400486 on line #7:
400486: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
It looks like in this example GCC is producing very inefficient code, and it will evict the cache line because of these temporal storages. Please confirm, maybe there is something I am not getting.
GCC makes very inefficient code when compiling on
-O0– what you’re seeing is basically a raw translation of its internal representation of the program. This internal representation includes a number of temporary variables, and your load/store pair here is a value passing through such a temporary. On higher optimization levels these kinds of useless loads/stores will mostly be eliminated; however on-O0even the simplest of analysis is disabled.