I’m curious as to why Haskell implementations use a GC.
I can’t think of a case where GC would be necessary in a pure language. Is it just an optimization to reduce copying, or is it actually necessary?
I’m looking for example code that would leak if a GC wasn’t present.
As others have already pointed out, Haskell requires automatic, dynamic memory management: automatic memory management is necessary because manual memory management is unsafe; dynamic memory management is necessary because for some programs, the lifetime of an object can only be determined at runtime.
For example, consider the following program:
In this program, the list
[1..1000]must be kept in memory until the user types “clear”; so the lifetime of this must be determined dynamically, and this is why dynamic memory management is necessary.So in this sense, automated dynamic memory allocation is necessary, and in practice this means: yes, Haskell requires a garbage collector, since garbage collection is the highest-performance automatic dynamic memory manager.
However…
Although a garbage collector is necessary, we might try to find some special cases where the compiler can use a cheaper memory management scheme than garbage collection. For instance, given
we might hope for the compiler to detect that
x2can safely be deallocated whenfreturns (rather than waiting for the garbage collector to deallocatex2). Essentially, we are asking that the compiler perform escape analysis to convert allocations in to garbage-collected heap to allocations on the stack wherever possible.This is not too unreasonable to ask for: the jhc haskell compiler does this, although GHC does not. Simon Marlow says that GHC’s generational garbage collector makes escape analysis mostly unnecessary.
jhc actually uses a sophisticated form of escape analysis known as region inference. Consider
In this case, a simplistic escape analysis would conclude that
x2escapes fromf(because it is returned in the tuple), and hencex2must be allocated on the garbage-collected heap. Region inference, on the other hand, is able to detect thatx2can be deallocated whengreturns; the idea here is thatx2should be allocated ing‘s region rather thanf‘s region.Beyond Haskell
While region inference is helpful in certain cases as discussed above, it appears to be difficult to reconcile effectively with lazy evaluation (see Edward Kmett’s and Simon Peyton Jones’ comments). For instance, consider
One might be tempted to allocate the list
[1..n]on the stack and deallocate it afterfreturns, but this would be catastrophic: it would changeffrom using O(1) memory (under garbage collection) to O(n) memory.Extensive work was done in the 1990s and early 2000s on region inference for the strict functional language ML. Mads Tofte, Lars Birkedal, Martin Elsman, Niels Hallenberg have written a quite readable retrospective on their work on region inference, much of which they integrated into the MLKit compiler. They experimented with purely region-based memory management (i.e. no garbage collector) as well as hybrid region-based/garbage-collected memory management, and reported that their test programs ran “between 10 times faster and 4 times slower” than pure garbage-collected versions.