I am trying to come up with equivalent of wc -l using Haskell Iteratee

Question

0

Asked: May 26, 20262026-05-26T15:41:05+00:00 2026-05-26T15:41:05+00:00

I am trying to come up with equivalent of wc -l using Haskell Iteratee

0

I am trying to come up with equivalent of “wc -l” using Haskell Iteratee library. Below is the code for “wc” (which just counts the words – similar to the code in iteratee example on hackage), and runs very fast:

{-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString


length1 :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
length1 = liftI (step 0)
  where
    step !i (Chunk xs) = liftI (step $ i + fromIntegral (LL.length xs))
    step !i stream     = idone i stream
{-# INLINE length1 #-}
main = do
  i' <- enumFile 1024 "/usr/share/dict/words" (length1 :: (Monad m) => Iteratee ByteString m Int)
  result <- run i'
  print result
  {- Time measured on a linux x86 box: 
  $ time ./test ## above haskell compiled code
  4950996

  real    0m0.013s
  user    0m0.004s
  sys     0m0.007s

  $  time wc -c /usr/share/dict/words
  4950996 /usr/share/dict/words

  real    0m0.003s
  user    0m0.000s
  sys     0m0.002s
  -}

Now, how do you extend it to count the number of lines that too runs fast? I did a version using Prelude.filter to filter only “\n” to length but it is slower than linux “wc -l” because of too much memory, and gc (lazy evaluation, I guess). So, I wrote another version using Data.ListLike.filter but it won’t compile because it doesn’t type check – help here would be appreciated:

  {-# LANGUAGE BangPatterns #-}
  import Data.Iteratee as I
  import Data.ListLike as LL
  import Data.Iteratee.IO
  import Data.ByteString
  import Data.Char
  import Data.ByteString.Char8 (pack)

  numlines :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
  numlines = liftI $ step 0
    where
      step !i (Chunk xs) = liftI (step $i + fromIntegral (LL.length $ LL.filter (\x ->  x == Data.ByteString.Char8.pack "\n")  xs))
      step !i stream = idone i stream
  {-# INLINE numlines #-}

  main = do
    i' <- enumFile 1024 "/usr/share/dict/words" (numlines :: (Monad m) => Iteratee ByteString m Int)
    result <- run i'
    print result

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:41:05+00:00

There are a lot of good answers already; I have very little to offer performance-wise but a few style points.

First, I would write it this way:

import Prelude as P
import Data.Iteratee
import qualified Data.Iteratee as I
import qualified Data.Iteratee.IO as I
import qualified Data.ByteString as B
import Data.Char
import System.Environment

-- numLines has a concrete stream type so it's not necessary to provide an
-- annotation later.  It could have a more general type.
numLines :: Monad m => I.Iteratee B.ByteString m Int
numLines = I.foldl' step 0
 where
  --step :: Int -> Word8 -> Int
  step acc el = if el == (fromIntegral $ ord '\n') then acc + 1 else acc

main = do
  f:_   <- getArgs
  words <- run =<< I.enumFile 65536 f numLines
  print words

The biggest difference is that this uses Data.Iteratee.ListLike.foldl'. Note that only the individual stream elements matter to the step function, not the stream type. It’s exactly the same function as you would use with e.g. Data.ByteString.Lazy.foldl'.

Using foldl' also means that you don’t need to manually write iteratees with liftI. I would discourage users from doing so unless absolutely necessary. The result is usually longer and harder to maintain with little to no benefit.

Finally, I’ve increased the buffer size significantly. On my system this is marginally faster than enumerators default of 4096, which is again marginally faster (with iteratee) than your choice of 1024. YMMV with this setting of course.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to come up with equivalent of wc -l using Haskell Iteratee

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply