I’m implementing some algorithms which works on large data (~250 MB – 1 GB). For this I needed a loop to do some benchmarking. However, in the process I learn that F# is doing some nasty things, which I hope some of you can clarify.
Here is my code (description of the problem is below):
open System
for i = 1 to 10 do
Array2D.zeroCreate 10000 10000 |> ignore
printfn "%d" (GC.GetTotalMemory(true))
Array2D.zeroCreate 10000 10000 |> ignore
// should force a garbage collection, and GC.Collect() doesn't help either
printfn "%d" (GC.GetTotalMemory(true))
Array2D.zeroCreate 10000 10000 |> ignore
printfn "%d" (GC.GetTotalMemory(true))
Array2D.zeroCreate 10000 10000 |> ignore
printfn "%d" (GC.GetTotalMemory(true))
Array2D.zeroCreate 10000 10000 |> ignore
printfn "%d" (GC.GetTotalMemory(true))
Console.ReadLine() |> ignore
Here the output will be like:
54000
54000
54000
54000
54000
54000
54000
54000
54000
54000
400000000
800000000
1200000000
Out of memory exception
So, in the loop F# discards the result, but when I’m not in the loop F# will keep references to “dead data” (I’ve looked in the IL, and apparently the class Program gets fields for this data). Why? And can I fix that?
This code is runned outside Visual Studio and in release mode.
The reason for this behavior is that the F# compiler behaves differently in the global scope than in local scope. A variable declared at global scope is turned into a static field. A module declaration is a static class with
letdeclarations compiled as fields/properties/methods.The simplest way to fix the problem is to write your code in a function:
… but why does the compiler declare fields when you’re not using the value and just
ignoreit? This is quite interesting – theignorefunction is a very simple function that is inlined when you use it. The declaration islet inline ignore _ = (). When inlining the function, the compiler declares some variables (to store the arguments of the function).So, another way to fix this is to omit
ignoreand write:You’ll get some compiler warnings, because the result of expression is not
unit, but it will work. However, using some function and writing code in local scope is probably more reliable.