When a compiler performs a loop-unroll optimization, how does it determined by which factor to unroll the loop or whether to unroll the whole loop? Since this is a space-performance trade-off, on average how effictive is this optimization technique in making the program perform better? Also, under what conditions is it recommended to use this technique (i.e certain operations or calculations)?
This doesn’t have to be specific to a certain compiler. It can be any explanation outlining the idea behind this technique and what has been observed in practice.
stack consumption and locality. instruction counts. ability to make/propagate optimizations based on the unrolled and inlined program. whether the loop size is fixed, or expected to be in a certain range. profile inputs (if applicable). operations which may be removed from the loop body. etc.
it depends largely on the input (your program). it can be slower (not typical) or it can be several times faster. writing a program to run optimally and which also enables the optimizer to do its job is learned.
generally, a large number of iterations on very small bodies, particularly that which is branchless and has good data locality.
if you want to know if the option helps your app, profile.
if you need more than that, you should reserve some time to learn how to write optimal programs, since the subject is quite complex.