Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6802841
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T19:16:18+00:00 2026-05-26T19:16:18+00:00

I did some Criterion benchmarks to estimate how much performance I lose by running

  • 0

I did some Criterion benchmarks to estimate how much performance I lose by running my code over a monad stack. The results were rather curious, and I have probably stumbled upon some laziness pitfall in my benchmark.

The benchmark tells me that running WriterT String IO is 20 times(!) slower than running plain IO, even when not using tell. Weirdly, if I stack WriterT with ReaderT and ContT it is just 5 times slower. This probably is a bug in my benchmark. What am I doing wrong here?

The benchmark

{-#LANGUAGE BangPatterns#-}
module Main where
import Criterion.Main
import Control.Monad
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Cont

process :: Monad m => Int -> m Int
process = foldl (>=>) return (replicate 100000 (\(!x) -> return (x+1)))

test n = process n >> return ()

main = defaultMain [
      bench "Plain"  t0
     ,bench "Writer" t1
     ,bench "Reader" t2
     ,bench "Cont"   t3
     ,bench "RWC"    t4
    ]

t0 = test 1 :: IO ()
t1 = (runWriterT  (test 1:: WriterT String IO ()) >> return ()) :: IO ()
t2 = (runReaderT (test 1:: ReaderT String IO ()) "" >> return ()) :: IO ()
t3 = (runContT   (test 1:: ContT () IO ()) (return) >> return ()) :: IO ()
t4 = ((runWriterT . flip runReaderT "" . flip runContT return $
      (test 1 :: ContT () (ReaderT String (WriterT String IO)) ())) >> return ()) :: IO ()

The results

benchmarking Plain
mean: 1.938814 ms, lb 1.846508 ms, ub 2.052165 ms, ci 0.950
std dev: 519.7248 us, lb 428.4684 us, ub 709.3670 us, ci 0.950

benchmarking Writer
mean: 39.50431 ms, lb 38.25233 ms, ub 40.74437 ms, ci 0.950
std dev: 6.378220 ms, lb 5.738682 ms, ub 7.155760 ms, ci 0.950

benchmarking Reader
mean: 12.52823 ms, lb 12.03947 ms, ub 13.09994 ms, ci 0.950
std dev: 2.706265 ms, lb 2.324519 ms, ub 3.462641 ms, ci 0.950

benchmarking Cont
mean: 8.100272 ms, lb 7.634488 ms, ub 8.633348 ms, ci 0.950
std dev: 2.562829 ms, lb 2.281561 ms, ub 2.878463 ms, ci 0.950

benchmarking RWC
mean: 9.871992 ms, lb 9.436721 ms, ub 10.37302 ms, ci 0.950
std dev: 2.387364 ms, lb 2.136819 ms, ub 2.721750 ms, ci 0.950
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T19:16:19+00:00Added an answer on May 26, 2026 at 7:16 pm

    As you’ve noticed, the lazy writer monad is quite slow. Using the strict version as Daniel Fischer suggests helps a lot, but why does it become so much faster when used in the big stack?

    To answer this question, we take a look at the implementation of these transformers. First, the lazy writer monad transformer.

    newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }
    
    instance (Monoid w, Monad m) => Monad (WriterT w m) where
        return a = WriterT $ return (a, mempty)
        m >>= k  = WriterT $ do
            ~(a, w)  <- runWriterT m
            ~(b, w') <- runWriterT (k a)
            return (b, w `mappend` w')
    

    As you can see, this does quite a lot. It runs the actions of the underlying monad, does some pattern matching and gathers the written values. Pretty much what you’d expect. The strict version is similar, only without irrefutable (lazy) patterns.

    newtype ReaderT r m a = ReaderT { runReaderT :: r -> m a }
    
    instance (Monad m) => Monad (ReaderT r m) where
        return   = lift . return
        m >>= k  = ReaderT $ \ r -> do
            a <- runReaderT m r
            runReaderT (k a) r
    

    The reader transformer is a bit leaner. It distributes the reader environment and calls upon the underlying monad to perform the actions. No surprises here.

    Now, let’s look at ContT.

    newtype ContT r m a = ContT { runContT :: (a -> m r) -> m r }
    
    instance Monad (ContT r m) where
        return a = ContT ($ a)
        m >>= k  = ContT $ \c -> runContT m (\a -> runContT (k a) c)
    

    Notice anything different? It does not actually use any functions from the underlying monad! In fact, it doesn’t even require m to be a monad. That means that no slow pattern matching or appends are being done at all. Only when you actually try to lift any actions from the underlying monad does ContT use its bind operator.

    instance MonadTrans (ContT r) where
        lift m = ContT (m >>=)
    

    So since you’re not actually doing any writer-specific stuff, ContT avoids using the slow bind operator from WriterT. That’s why having ContT on top of your stack makes it so much faster, and why the run time of the ContT () IO () is so similar to that of the deeper stack.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Last evening I did some housekeeping on our code repository - basically moved the
I did some reading on IDE`s (I am currently using Code::Blocks) and everyone appears
I did some work to transplant TCP/IP stack to MCU. Through GPRS, MCU can
I did some checking over my site today to see whether it was Cross
Did some searches here & on the 'net and haven't found a good answer
Did some googling and couldn't find a clear answer on this. My assumption is
I did some tests a while ago and never figured out how to make
I did some googling to try to answer this question but even after that
I did some HTTP monitoring with WireShark. Are there more tools like this that
I did some timing tests and also read some articles like this one (last

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.