In the following program, I would only expect cycle3 to run with constant memory (it explicitly ties the knot). However, cycle2 also runs in constant memory for reasons I cannot fathom. I expect cycle2 to do the exact same work as cycle1 because
xs' = xs ++ xs'
xs' = xs ++ xs ++ xs' -- substitute value of xs'
xs' = xs ++ xs ++ xs ++ xs' -- substitute value of xs' again
xs' = xs ++ xs ++ xs ++ ... -- and so on
Can someone explain what I’m missing here?
module Main where
import System.Environment (getArgs)
cycle1 :: [a] -> [a]
cycle1 [] = error "empty list"
cycle1 xs = xs ++ cycle1 xs
cycle2 :: [a] -> [a]
cycle2 [] = error "empty list"
cycle2 xs = xs' where xs' = xs ++ xs'
cycle3 :: [a] -> [a]
cycle3 [] = error "empty list"
cycle3 xs = let
xs' = go xs' xs
in xs'
where
go :: [a] -> [a] -> [a]
go start [last] = last : start
go start (x:xs) = x : go start xs
testMem :: (Show a) => ([a] -> [a]) -> [a] -> IO ()
testMem f xs = print (xs', xs') -- prints only first val, need second val holds onto reference
where
xs' = f xs
main :: IO ()
main = do
args <- getArgs
let mCycleFunc = case args of
["1"] -> Just cycle1
["2"] -> Just cycle2
["3"] -> Just cycle3
_ -> Nothing
case mCycleFunc of
Just cycleFunc -> testMem cycleFunc [0..8]
Nothing -> putStrLn "Valid args are one of {1, 2, 3}."
It comes down to sharing or non-sharing of equal thunks. Two equal thunks are thunks that are guaranteed to produce the same result. In the case of
cycle1, you’re creating a new thunk forcycle1 xseach time you hit the[]at the end ofxs. New memory needs to be allocated for that thunk, and its value needs to be computed from scratch, which allocates new list pairs as you go through it.I think the way
cycle2avoids this becomes easier to understand if you renamexs'toresult(and I removed the “error on[]” case):This definition is semantically equivalent to
cycle1(produces the same results for the same arguments), but the key to understanding the memory usage is to look at it in terms of what thunks get created. When you execute the compiled code for this function, all it does right away is create a thunk forresult. You can think of the thunk as a mutable type more or less like this (total made-up pseudocode):This is either a record containing pointers to the thunks for the values needed plus a pointer to the code that forces these thunks, or just the result of the calculation. In the case of
cycle2, the thunk forresultpoints to the object code for(++)and the thunks forxsandresult. This last bit means that the thunk forresulthas a pointer back to itself, which explains the constant space behavior; the last step in forcingresultis to make it point back to itself.In the case of
cycle1, on the other hand the thunk has the code for(++), a thunk forxs, and a new thunk to calculatecycle1 xsfrom scratch. In principle it would be possible for the compiler to recognize that the reference to this latter thunk could be substituted with one to the “parent” chunk, but the compiler doesn’t do that; whereas incycle2it can’t help but do it (one instantiated binding of one variable = one chunk).Note that this self-referential thunk behavior can be factored out into a suitable implementation of
fix: