I was trying out different approaches to getting a number at a given index of the Fibonacci sequence and they could basically be divided into two categories:
- building a list and querying an index
- using variables (might be separate or tupled, without a list)
I picked an example of both:
fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
where fibs1' = 0 : scanl (+) 1 fibs1'
fib2 :: Int -> Integer
fib2 n = fib2' 1 1 n where
fib2' _ b 2 = b
fib2' a b n = fib2' b (a + b) (n - 1)
fibs1:
real 0m2.356s
user 0m2.310s
sys 0m0.030s
fibs2:
real 0m0.671s
user 0m0.667s
sys 0m0.000s
Both were compiled with 64bit GHC 7.6.1 and -O2 -fllvm. Their core dumps are very similar in length, but they differ in the parts that I’m not very proficient at interpreting.
I was not surprised that fibs1 failed for n = 350000 (Stack space overflow). However, I am not comfortable with the fact that it used that much memory.
I would like to clear some things up:
- Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?
- Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?
EDIT: Sorry, I mixed the speed results, fixed. Two of three of my doubts are still valid, though ;).
fibs1uses a lot of memory and is slow becausescanlis lazy, it doesn’t evaluate the list elements, soproduces
etc. So you rather quickly get a huge nested thunk. When that thunk is evaluated, it is pushed on the stack, and at some point between 250000 and 350000, it becomes too big for the default stack.
And since each list element holds a reference to the previous while it is not evaluated, the beginning of the list cannot be garbage-collected.
If you use a strict scan,
when the
k-th list cell is produced, its value is already evaluated, so doesn’t refer to a previous, hence the list can be garbage collected (assuming nothing else holds a reference to it) as it is traversed.With that implementation, the list version is about as fast and lean as
fib2(it needs to allocate list cells nevertheless, so it allocates a small bit more, and is possibly a tiny bit slower therefore, but the difference is minute, since the Fibonacci numbers become so large that the list construction overhead becomes negligible).The idea of
scanlis that its result is incrementally consumed, so that the consumption forces the elements and prevents the build-up of large thunks.Its optimiser can’t see through the algorithm to determine that.
scanlis opaque to the compiler, it doesn’t know whatscanldoes.If we take the exact source code for
scanl(renaming it or hidingscanlfrom the Prelude, I opted for renaming),and compile the module exporting it (with -O2), and then look at the generated interface file with
we get (for example, minor differences between compiler versions)
and see that the interface file doesn’t expose the unfolding of the function, only its type, arity, strictness and that it doesn’t refer to CAFs.
When a module importing that is compiled, all that the compiler has to go by is the information exposed by the interface file.
Here, there is no information exposed that would allow the compiler to do anything else but emit a call to the function.
If the unfolding were exposed, the compiler had a chance to inline the unfolding and analyse the code knowing the types and combination function to produce more eager code that doesn’t build thunks.
The semantics of
scanl, however, are maximally lazy, each element of the output is emitted before the input list is inspected. That has the consequence that GHC can’t make the addition strict, since that would change the result if the list contained any undefined values:while
One could make a variant
that would produce
1 : *** Exception: Prelude.undefinedfor the above input, but any strictness would indeed change the result if the list contained undefined values, so even if the compiler knew the unfolding, it couldn’t make the evaluation strict – unless it could prove that there are no undefined values in the list, a fact that is obvious to us, but not the compiler [and I don’t think it would be easy to teach a compiler recognize that and be able to prove the absence of undefined values].