This is my code for counting lines and words:
import System.IO
import Data.List
main = do
hSetBinaryMode stdin True
interact $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n")
. foldl' (\(w,l) r-> w `seq` l `seq` (w+length r ,succ l) ) (0,0)
. lines
This takes about 10 seconds to run on a file of about 100 megabytes. I compared it to similar programs in Lua (9s), awk (20s), and wc -l -c (0.6s).
Why is this code so slow? What could be the problem?
I/O with
Stringis known to be less than fast in Haskell. The bytes read from the handle in general have to be converted to Unicode code points, and then a linked list is built from those. That’s a lot of work causing a lot of allocation. In this case, the conversion to code points is a bit simpler, since you set stdin to binary mode, but the construction of the linked list of characters still takes a lot of time.Another small factor is that your line count is using
Integer, but that’s minor and only plays a significant role when the I/O is up to speed.If you need fast I/O, you have to use a type better suited for that. One possibility is using
ByteString, for exampledoes the job on a 94MB file in 0.12s on my box (wc -l -c takes 0.06s), while the original using
Stringtook 4.4s. It can be optimised further,takes only 0.08s, which is decent enough for me to stop optimising there (a similar change for the
Stringversion brings the time down to 3.6s for that).