here is my code:(get file line num and word count)
import System.IO
import Data.Maybe
readL::(Int,Int,Int)->IO()
readL (w,l,-1) = do
putStrLn $ "word:" ++(show w )++"\nline:"++(show l)
readL (w,l,0) = do
s<-hIsEOF stdin
if s
then readL (w,l,-1)
else
do
f<-getLine
readL (w+length f,l+1,0)
main = do
hSetBinaryMode stdin True
readL (0,0,0)
when I process a file with size 100m,it just crashes,with error:
Stack space overflow: current size 8388608 bytes
Is there something I wrote wrong?
I also have another version here:
import System.IO
import Data.List
main = do
hSetBinaryMode stdin True
interact $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n"). foldl' (\(w,l) r-> (w + length r,l+1) ) (0,0) .lines
this have the same problem too… and with lots of memory,so,anybody can slove this?I’m just a new learner in haskell.
The problem is that neither the
wnor thelparameter toreadLare evaluated before the end of input is reached. So for an input with many lines, you build huge thunks(((0 + length line1) + length line2) ... + length lastline), similar forl, and for more than half a million lines or so, evaluating that thunk will not fit in the available stack. Additionally, thelength fholds on to the line read until it is evaluated, causing unnecessarily large memory use.You have to keep the accumulating parameters evaluated in the loop, the easiest way is with bang-patterns
or a
seq:The
foldl'version has the same problem,only evaluates the accumulator pair to weak head normal form, that is to the outermost constructor, here
(,). It does not force evaluation of the components. To do that, you canuse a strict pair type for the fold
or use
seqin the folded function