I did some Criterion benchmarks to estimate how much performance I lose by running

Question

0

Asked: May 26, 20262026-05-26T19:16:18+00:00 2026-05-26T19:16:18+00:00

I did some Criterion benchmarks to estimate how much performance I lose by running

0

I did some Criterion benchmarks to estimate how much performance I lose by running my code over a monad stack. The results were rather curious, and I have probably stumbled upon some laziness pitfall in my benchmark.

The benchmark tells me that running WriterT String IO is 20 times(!) slower than running plain IO, even when not using tell. Weirdly, if I stack WriterT with ReaderT and ContT it is just 5 times slower. This probably is a bug in my benchmark. What am I doing wrong here?

The benchmark

{-#LANGUAGE BangPatterns#-}
module Main where
import Criterion.Main
import Control.Monad
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Cont

process :: Monad m => Int -> m Int
process = foldl (>=>) return (replicate 100000 (\(!x) -> return (x+1)))

test n = process n >> return ()

main = defaultMain [
      bench "Plain"  t0
     ,bench "Writer" t1
     ,bench "Reader" t2
     ,bench "Cont"   t3
     ,bench "RWC"    t4
    ]

t0 = test 1 :: IO ()
t1 = (runWriterT  (test 1:: WriterT String IO ()) >> return ()) :: IO ()
t2 = (runReaderT (test 1:: ReaderT String IO ()) "" >> return ()) :: IO ()
t3 = (runContT   (test 1:: ContT () IO ()) (return) >> return ()) :: IO ()
t4 = ((runWriterT . flip runReaderT "" . flip runContT return $
      (test 1 :: ContT () (ReaderT String (WriterT String IO)) ())) >> return ()) :: IO ()

The results

benchmarking Plain
mean: 1.938814 ms, lb 1.846508 ms, ub 2.052165 ms, ci 0.950
std dev: 519.7248 us, lb 428.4684 us, ub 709.3670 us, ci 0.950

benchmarking Writer
mean: 39.50431 ms, lb 38.25233 ms, ub 40.74437 ms, ci 0.950
std dev: 6.378220 ms, lb 5.738682 ms, ub 7.155760 ms, ci 0.950

benchmarking Reader
mean: 12.52823 ms, lb 12.03947 ms, ub 13.09994 ms, ci 0.950
std dev: 2.706265 ms, lb 2.324519 ms, ub 3.462641 ms, ci 0.950

benchmarking Cont
mean: 8.100272 ms, lb 7.634488 ms, ub 8.633348 ms, ci 0.950
std dev: 2.562829 ms, lb 2.281561 ms, ub 2.878463 ms, ci 0.950

benchmarking RWC
mean: 9.871992 ms, lb 9.436721 ms, ub 10.37302 ms, ci 0.950
std dev: 2.387364 ms, lb 2.136819 ms, ub 2.721750 ms, ci 0.950

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T19:16:19+00:00

As you’ve noticed, the lazy writer monad is quite slow. Using the strict version as Daniel Fischer suggests helps a lot, but why does it become so much faster when used in the big stack?

To answer this question, we take a look at the implementation of these transformers. First, the lazy writer monad transformer.

newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }

instance (Monoid w, Monad m) => Monad (WriterT w m) where
    return a = WriterT $ return (a, mempty)
    m >>= k  = WriterT $ do
        ~(a, w)  <- runWriterT m
        ~(b, w') <- runWriterT (k a)
        return (b, w `mappend` w')

As you can see, this does quite a lot. It runs the actions of the underlying monad, does some pattern matching and gathers the written values. Pretty much what you’d expect. The strict version is similar, only without irrefutable (lazy) patterns.

newtype ReaderT r m a = ReaderT { runReaderT :: r -> m a }

instance (Monad m) => Monad (ReaderT r m) where
    return   = lift . return
    m >>= k  = ReaderT $ \ r -> do
        a <- runReaderT m r
        runReaderT (k a) r

The reader transformer is a bit leaner. It distributes the reader environment and calls upon the underlying monad to perform the actions. No surprises here.

Now, let’s look at ContT.

newtype ContT r m a = ContT { runContT :: (a -> m r) -> m r }

instance Monad (ContT r m) where
    return a = ContT ($ a)
    m >>= k  = ContT $ \c -> runContT m (\a -> runContT (k a) c)

Notice anything different? It does not actually use any functions from the underlying monad! In fact, it doesn’t even require m to be a monad. That means that no slow pattern matching or appends are being done at all. Only when you actually try to lift any actions from the underlying monad does ContT use its bind operator.

instance MonadTrans (ContT r) where
    lift m = ContT (m >>=)

So since you’re not actually doing any writer-specific stuff, ContT avoids using the slow bind operator from WriterT. That’s why having ContT on top of your stack makes it so much faster, and why the run time of the ContT () IO () is so similar to that of the deeper stack.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I did some Criterion benchmarks to estimate how much performance I lose by running

The benchmark

The results

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply