I am curious about the runtime performance of an infinite list like
the one below:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
This will create an infinite list of the fibonacci sequence.
My question is that if I do the following:
takeWhile (<5) fibs
how many times does fibs evaluate each term in the list? It seems
that since takeWhile checks the predicate function for each item in
the list, the fibs list will evaluate each term multiple times. The
first 2 terms are given for free. When takeWhile wants to evaluate
(<5) on the 3rd element, we will get:
1 : 1 : zipWith (+) [(1, 1), (1)] => 1 : 1 : 3
Now, once takeWhile wants to evaluate (<5) on the 4th element: the
recursive nature of fibs will construct the list again like the
following:
1 : 1 : zipWith (+) [(1, 2), (2, 3)] => 1 : 1 : 3 : 5
It would seem that the 3rd element needs to be computed again when we
want to evaluate the value of the 4th element. Furthermore, if the
predicate in takeWhile is large, it would indicate the function is
doing more work that is needed since it is evaluating each preceding
element in the list multiple times. Is my analysis here correct or is
Haskell doing some caching to prevent multiple evaluations here?
This is a self-referential, lazy data structure, where “later” parts of the structure refer to earlier parts by name.
Initially, the structure is just a computation with unevaluated pointers back to itself. As it is unfolded, values are created in the structure. Later references to already-computed parts of the structure are able to find the value already there waiting for them. No need to re-evaluate the pieces, and no extra work to do!
The structure in memory begins as just an unevaluated pointer. Once we look at the first value, it looks like this:
(a pointer to a cons cell, pointing at ‘1’, and a tail holding the second ‘1’, and a pointer to a function that holds references back to fibs, and the tail of fibs.
Evaluating one more step expands the structure, and slides the references along:
And so we go unfolding the structure, each time yielding a new unevaluated tail, which is a closure holding references back to 1st and 2nd elements of the last step. This process can continue infinitely 🙂
And because we’re referring to prior values by name, GHC happily retains them in memory for us, so each item is evaluated only once.