I’ve seen many examples in functional languages about processing a list and constructing a function to do something with its elements after receiving some additional value (usually not present at the time the function was generated), such as:
-
Calculating the difference between each element and the average
(the last 2 examples under “Lazy Evaluation”)
-
Staging a list append in strict functional languages such as ML/OCaml, to avoid traversing the first list more than once
(the section titled “Staging”)
-
Comparing a list to another with foldr (i.e. generating a function to compare another list to the first)
listEq a b = foldr comb null a b where comb x frec [] = False comb x frec (e:es) = x == e && frec es cmp1To10 = listEq [1..10]
In all these examples, the authors generally remark the benefit of traversing the original list only once. But I can’t keep myself from thinking “sure, instead of traversing a list of N elements, you are traversing a chain of N evaluations, so what?”. I know there must be some benefit to it, could someone explain it please?
Edit: Thanks to both for the answers. Unfortunately, that’s not what I wanted to know. I’ll try to clarify my question, so it’s not confused with the (more common) one about creating intermediate lists (which I already read about in various places). Also thanks for correcting my post formatting.
I’m interested in the cases where you construct a function to be applied to a list, where you don’t yet have the necessary value to evaluate the result (be it a list or not). Then you can’t avoid generating references to each list element (even if the list structure is not referenced anymore). And you have the same memory accesses as before, but you don’t have to deconstruct the list (pattern matching).
For example, see the “staging” chapter in the mentioned ML book. I’ve tried it in ML and Racket, more specifically the staged version of “append” which traverses the first list and returns a function to insert the second list at the tail, without traversing the first list many times. Surprisingly for me, it was much faster even considering it still had to copy the list structure as the last pointer was different on each case.
The following is a variant of map which after applied to a list, it should be faster when changing the function. As Haskell is not strict, I would have to force the evaluation of listMap [1..100000] in cachedList (or maybe not, as after the first application it should still be in memory).
listMap = foldr comb (const [])
where comb x rest = \f -> f x : rest f
cachedList = listMap [1..100000]
doubles = cachedList (2*)
squares = cachedList (\x -> x*x)
-- print doubles and squares
-- ...
I know in Haskell it doesn’t make a difference (please correct me if I’m wrong) using comb x rest f = ... vs comb x rest = \f -> ..., but I chose this version to emphasize the idea.
Update: after some simple tests, I couldn’t find any difference in execution times in Haskell. The question then is only about strict languages such as Scheme (at least the Racket implementation, where I tested it) and ML.
So the answer to your question is, partial compilation. Done ahead of time, it makes it so that there’s no need to traverse the list to get to the individual elements – all the references are found in advance and stored inside the pre-compiled function.
As to your concern about the need for that function to be traversed too, it would be true in interpreted languages. But compilation eliminates this problem.
In the presence of laziness this coding trick may lead to the opposite results. Having full equations, e.g. Haskell GHC compiler is able to perform all kinds of optimizations, which essentially eliminate the lists completely and turn the code into an equivalent of loops. This happens when we compile the code with e.g.
-O2switch.Writing out the partial equations may prevent this compiler optimization and force the actual creation of functions – with drastic slowdown of the resulting code. I tried your
cachedListcode and saw a 0.01s execution time turn into 0.20s (don’t remember right now the exact test I did).