I’m reading << real world haskell >> Chapter 8 and wanted to see how the SumFile.hs program handles say, 1 million numbers:
main :: IO ()
main = do
contents <- getContents
print (sumFile contents)
where sumFile = sum . map read . words
When I feed 1 million integers to the program with:
runhaskell SumFile.hs < data.txt, the program gives a correct result.
However, when I compiled it using GHC:
ghc SumFile.hs
The binary gives a “Stack space overflow” error:
./SumFile < data.txt
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
I have two questions:
- What is causing the stack space usage?
- Why does the compiled version differ from the interpreted version and what can I do?
Thanks!
EDIT:
Alright the reason is map, but here’s a modified version that uses lazy bytestring:
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as LCHAR
import Data.Monoid
import Data.List
main :: IO ()
main = do
contents <- L.getContents
case sumFile contents of
Nothing -> print "Invalid input"
Just s -> print $ getSum s
where sumFile = foldl' mappend (Just (Sum 0)) . map ((fmap Sum) . (fmap fst) . LCHAR.readInt) . (LCHAR.words)
The result is the same: binary version uses up stack space even though I’m not using sum.
First, simple clarification: the stack in ghc runtime has nothing to deal with stack segment, it is internal structure of runtime and this is not source of buffer-overflow type attacks.
Second. Haskell is lazy. Lazy io (getContents) produce lazy list. sum produce result lazily. However, once the result of sum is requested, it has to dig into list recursively, quickly exhausting stack space (you can look in the sources if wish)
to avoid it, you have to use strict version of sum, it should eliminate problem. Standard library has a special function for such cases, foldl’ – a strict version of foldl. using
foldl' (+) 0in place of sum should eliminate problemThird. Stack space leaks are very common problem when one use lazy IO. It may be solved if one switch to iteratee-based IO. Otherwise one should learn to add strictness annotation where needed.
Ah. And by the way. GHC is optimizing compiler. It is not common, but still possible to have some problems with memory leakage in compiled program and to not have them with ghci and vice versa.