I compiled the following C code:
typedef struct {
long x, y, z;
} Foo;
long Bar(Foo *f, long i)
{
return f[i].x + f[i].y + f[i].z;
}
with the command gcc -S -O3 test.c. Here is the Bar function in the output:
.section __TEXT,__text,regular,pure_instructions
.globl _Bar
.align 4, 0x90
_Bar:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
leaq (%rsi,%rsi,2), %rcx
movq 8(%rdi,%rcx,8), %rax
addq (%rdi,%rcx,8), %rax
addq 16(%rdi,%rcx,8), %rax
popq %rbp
ret
Leh_func_end1:
I have a few questions about this assembly code:
- What is the purpose of “
pushq %rbp“, “movq %rsp, %rbp“, and “popq %rbp“, if neitherrbpnorrspis used in the body of the function? - Why do
rsiandrdiautomatically contain the arguments to the C function (iandf, respectively) without reading them from the stack? -
I tried increasing the size of Foo to 88 bytes (11
longs) and theleaqinstruction became animulq. Would it make sense to design my structs to have “rounder” sizes to avoid the multiply instructions (in order to optimize array access)? Theleaqinstruction was replaced with:imulq $88, %rsi, %rcx
The function is simply building its own stack frame with these instructions. There’s nothing really unusual about them. You should note, though, that due to this function’s small size, it will probably be inlined when used in the code. The compiler is always required to produce a “normal” version of the function, though. Also, what @ouah said in his answer.
This is because that’s how the AMD64 ABI specifies the arguments should be passed to functions.
Page 20, AMD64 ABI Draft 0.99.5 – September 3, 2010
This is not directly related to the structure size, rather – the absolute address that the function has to access. If the size of the structure is 24 bytes,
fis the address of the array containing the structures, andiis the index at which the array has to be accessed, then the byte offset to each structure isi*24. Multiplying by 24 in this case is achieved by a combination ofleaand SIB addressing. The firstleainstruction simply calculatesi*3, then every subsequent instruction uses thati*3and multiplies it further by 8, therefore accessing the array at the needed absolute byte offset, and then using immediate displacements to access the individual structure members ((%rdi,%rcx,8).8(%rdi,%rcx,8), and16(%rdi,%rcx,8)). If you make the size of the structure 88 bytes, there is simply no way of doing such a thing swiftly with a combination ofleaand any kind of addressing. The compiler simply assumes that a simpleimullwill be more efficient in calculatingi*88than a series of shifts, adds,leas or anything else.