I ran into an unexpected code optimization recently, and wanted to check whether my interpretation of what I was observing was correct. The following is a much simplified example of the situation:
let demo =
let swap fst snd i =
if i = fst then snd else
if i = snd then fst else
i
[ for i in 1 .. 10000 -> swap 1 i i ]
let demo2 =
let swap (fst: int) snd i =
if i = fst then snd else
if i = snd then fst else
i
[ for i in 1 .. 10000 -> swap 1 i i ]
The only difference between the 2 blocks of code is that in the second case, I explicitly declare the arguments of swap as integers. Yet, when I run the 2 snippets in fsi with #time, I get:
Case 1 Real: 00:00:00.011, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
Case 2 Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
i.e. the 2nd snippet runs 3 times faster than the first. The absolute performance difference here is obviously not an issue, but if I were to use the swap function a lot, it would pile up.
My assumption is that the reason for the performance hit is that in the first case, swap is generic and “requires equality”, and checks whether int supports it, whereas the second case doesn’t have to check anything. Is this the reason this is happening, or am I missing something else? And more generally, should I consider automatic generalization a double-edged sword, that is, an awesome feature which may have unexpected effects on performance?
I think this is generally the same case as in the question Why is this F# code so slow. In that question, the performance issue is caused by a constraint requiring
comparisonand in your case, it is caused by theequalityconstraint.In both cases, the compiled generic code has to use interfaces (and boxing), while the specialized compiled code can directly use IL instructions for comparison or equality of integers or floating-point numbers.
The two ways to avoid the performance issue are:
intorfloatas you didinlineso that the compiler specializes it automaticallyFor smaller functions, the second approach is better, because it does not generate too much code and you can still write functions in a generic way. If you use function just for single type (by design), then using the first approach is probably appropriate.