Another microbenchmark: Why is this loop (compiled with ghc -O2 -fllvm , 7.4.1, Linux

Question

0

Asked: June 14, 20262026-06-14T02:43:12+00:00 2026-06-14T02:43:12+00:00

Another microbenchmark: Why is this loop (compiled with ghc -O2 -fllvm , 7.4.1, Linux

0

Another microbenchmark: Why is this “loop” (compiled with ghc -O2 -fllvm, 7.4.1, Linux 64bit 3.2 kernel, redirected to /dev/null)

mapM_ print [1..100000000]

about 5x slower than a simple for-cycle in plain C with write(2) non-buffered syscall? I am trying to gather Haskell gotchas.

Even this slow C solution is much faster than Haskell

int i;
char buf[16];
for (i=0; i<=100000000; i++) {
    sprintf(buf, "%d\n", i);
    write(1, buf, strlen(buf));
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T02:43:14+00:00

Okay, on my box the C code, compiled per gcc -O3 takes about 21.5 seconds to run, the original Haskell code about 56 seconds. So not a factor of 5, a bit above 2.5.

The first nontrivial difference is that

mapM_ print [1..100000000]

uses Integers, that’s a bit slower because it involves a check upfront, and then works with boxed Ints, while the Show instance of Int does the conversion work on unboxed Int#s.

Adding a type signature, so that the Haskell code works on Ints,

mapM_ print [1 :: Int .. 100000000]

brings the time down to 47 seconds, a bit above twice the time the C code takes.

Now, another big difference is that show produces a linked list of Char and doesn’t just fill a contiguous buffer of bytes. That is slower too.

Then that linked list of Chars is used to fill a byte buffer that then is written to the stdout handle.

So, the Haskell code does more, and more complicated things than the C code, thus it’s not surprising that it takes longer.

Admittedly, it would be desirable to have an easy way to output such things more directly (and hence faster). However, the proper way to handle it is to use a more suitable algorithm (that applies to C too). A simple change to

putStr . unlines $ map show [0 :: Int .. 100000000]

almost halves the time taken, and if one wants it really fast, one uses the faster ByteString I/O and builds the output efficiently as exemplified in applicative’s answer.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Another microbenchmark: Why is this loop (compiled with ghc -O2 -fllvm , 7.4.1, Linux

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply