I would like to know if anyone can help me out with a problem I am having when studying one of the lecture slides from an introductory assembly class that I am taking in school. The problem I am having is not understanding the assembly, it is how exactly the C source code is ordered based on the assembly. I will post the snippet I am talking about and maybe it will be clearer what I am talking about.
C Source given:
int arith(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
Assembly given:
arith:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax
movl %ebp,%esp
popl %ebp
ret
I am just confused as to how I am supposed to be able to discern for example that the adding of z + t1 (z + x + y) is listed on the second line(in the source) when in the assembly it comes after the y * 48 in the assembly code or for example that x + 4 is the 3rd line when in the assembly it is not even in a line by itself, its sort of mixed in with the last leal statement. It makes sense to me when I have the source but I am supposed to be able to reproduce the source for a test and I do understand that the compiler optimizes things but if anyone has a way of thinking about the reverse engineering that could help me out I would greatly appreciate it if they could walk me through their thought process.
Thanks.
I’ve broken down the disassembly for you to show how the assembly was produced from the C source.
8(%ebp)=x,12(%ebp)=y,16(%ebp)=zCreate the stack frame:
Move
xintoeax,yintoedx:t1 = x + y.leal(Load effective address) will addedxandeax, andt1will be inecx:int t4 = y * 48;in two steps below, multiply by 3, then by 16.t4will eventually be inedx:Multiply
edxby 2, and addedxto the result, ie.edx = edx * 3:Shift left 4 bits, ie. multiply by 16:
int t2 = z+t1;.ecxinitially holdst1,zis at16(%ebp), at the end of the instructionecxwill be holdingt2:int t5 = t3 + t4;.t3was simplyx + 4, and rather than calculating and storingt3, the expression oft3is placed inline. This instruction essential does(x+4) + t4, which is the same ast3+t4. It addsedx(t4) andeax(x), and adds 4 as an offset to achieve that result.int rval = t2 * t5;Fairly straight-forward this one;ecxrepresentst2andeaxrepresentst5. The return value is passed back to the caller througheax.Destroy the stack frame and restore
espandebp:Return from the routine:
From this example you can see that the result is the same, but the structure is a bit different. Most likely this code was compiled with some sort of optimization or someone wrote it themself like that to demonstrate a point.
As others have said, you can’t go exactly back to the source from the disassembly. It’s up to the interpretation of the person reading the assembly to come up with equivalent C code.
To help with learning assembly and understanding the disassembly of your C programs, you can do the following on Linux:
Compile with debug information (
-g), which will embed the source:If you’re on a 64-bit machine, you can tell the compiler to create a 32-bit binary with the
-m32flag (I did so for the example below).Use objdump to dump the object file with it’s source interleaved:
-d= disassembly,-S= display source. You can add-M intel-mnemonicto use the Intel ASM syntax if you prefer that over the AT&T syntax that your example uses.Output:
As you can see, without optimizations the compiler produces a larger binary than the example you have. You can play around with that and add a compiler optimization flag when compiling (ie.
-O1,-O2,-O3). The higher the optimization level, the more abstract the disassembly’s going to seem.For example, with just level 1 optimization (
gcc -c -g -O1 -m32 arith.c1), the assembly code produced is a lot shorter: