See above. I wrote to sample functions:
source.ll:
define i32 @bleh(i32 %x) {
entry:
%addtmp = add i32 %x, %x
%addtmp1 = add i32 %addtmp, %x
%addtmp2 = add i32 %addtmp1, %x
%addtmp3 = add i32 %addtmp2, %x
%addtmp4 = add i32 %addtmp3, 1
%addtmp5 = add i32 %addtmp4, 2
%addtmp6 = add i32 %addtmp5, 3
%multmp = mul i32 %x, 3
%addtmp7 = add i32 %addtmp6, %multmp
ret i32 %addtmp7
}
source-fp.ll:
define double @bleh(double %x) {
entry:
%addtmp = fadd double %x, %x
%addtmp1 = fadd double %addtmp, %x
%addtmp2 = fadd double %addtmp1, %x
%addtmp3 = fadd double %addtmp2, %x
%addtmp4 = fadd double %addtmp3, 1.000000e+00
%addtmp5 = fadd double %addtmp4, 2.000000e+00
%addtmp6 = fadd double %addtmp5, 3.000000e+00
%multmp = fmul double %x, 3.000000e+00
%addtmp7 = fadd double %addtmp6, %multmp
ret double %addtmp7
}
Why is it that when I optimize both functions using
opt -O3 source[-fp].ll -o opt.source[-fp].ll -S
that the i32 one gets optimized but the double one doesn’t? I expected the fadd to get combined to a single fmul. Instead it looks exactly the same.
Is it due to the flags being set differently? I am aware of certain optimizations that are possible for i32 that are not doable for double. But the absence of simple constant folding is beyond my understanding.
I am using LLVM 3.1.
It’s not quite true to say that no optimization is possible. I’ll go through the first few lines to show where transformations are and are not allowed:
This first line could safely be transformed to
fmul double %x 2.0e+0, but that’s not actually an optimization on most architectures (faddis generally as fast or faster thanfmul, and doesn’t require producing the constant2.0). Note that barring overflow, this operation is exact (like all scaling by powers of two).This line could be transformed to
fmul double %x 3.0e+0. Why is this a legal transformation? Because the computation that produced%addtmpwas exact, so only a single rounding is been incurred whether this is computed asx * 3orx + x + x. Because these are IEEE-754 basic operations and therefore correctly rounded, the result is the same either way. What about overflow? Neither may overflow unless the other does as well.This is the first line that cannot be legally transformed into constant * x.
4 * xwould compute exactly, without any rounding, whereasx + x + x + xincurs two roundings:x + x + xis rounded once, then addingxmay round a second time.Ditto here;
5 * xwould incur one rounding;x + x + x + x + xincurs three.The only line that might be beneficially transformed would be replacing
x + x + xwith3 * x. However, the subexpressionx + xis already present elsewhere, so an optimizer easily could choose not to employ this transform (since it can take advantage of the existing partial result if it does not).