I tried to test Haskell performance, but got some unxepectedly poor results:
-- main = do
-- putStrLn $ show $ sum' [1..1000000]
sum' :: [Int] -> Int
sum' [] = 0
sum' (x:xs) = x + sum' xs
I first ran it from ghci -O2:
> :set +s
> :sum' [1..1000000]
1784293664
(4.81 secs, 163156700 bytes)
Then I complied the code with ghc -O3, ran it using time and got this:
1784293664
real 0m0.728s
user 0m0.700s
sys 0m0.016s
Needless to say, these results are abysmal compared to the C code:
#include <stdio.h>
int main(void)
{
int i, n;
n = 0;
for (i = 1; i <= 1000000; ++i)
n += i;
printf("%d\n", n);
}
After compiling it with gcc -O3 and running it with time I got:
1784293664
real 0m0.022s
user 0m0.000s
sys 0m0.000s
What is the reason for such poor performance? I assumed that Haskell would never actually construct the list, am I wrong in that assumption? Is this something else?
UPD: Is the problem that Haskell doesn’t know that addition is associative? Is there a way to make it see and use that?
First, don’t bother to discuss GHCi when you’re talking about performance. It’s nonsense to use
-Oxflags with GHCi.You’re Building Up A Huge Computation
Using GHC 7.2.2 x86-64 with
-O2I get:The reason this uses so much stack space is upon every loop you build an expression of
i+..., so your computation is transformed into a huge thunk:That’s going to take a lot of memory. There is a reason the standard
sumisn’t defined like yoursum'.With A Reasonable Definition for
sumIf I change your
sum'tosumor an equivalent such asfoldl' (+) 0then I get:Which seems entirely reasonable to me. Keep in mind that, with such a short-running piece of code much of your measured time is noise (loading the binary, starting up the RTS and GC nursery, misc initializations, etc). Use Criterion (a benchmarking tool) if you want accurate measurements of small-ish Haskell computations.
Comparing to C
My
gcc -O3time is immeasurably low (reported as 0.002 seconds) because the main routine consists of 4 instructions – the entire computation is evaluated at compile time and the constant of0x746a5a2920is stored in the binary.There is a rather long Haskell thread (here, but be ware it’s something of an epic flame war that still burns in peoples minds almost 3 years later) where people discuss the realities of doing this in GHC starting from your exact benchmark – it isn’t there yet but they did come up with some Template Haskell work that would do this if you wish to achieve the same results selectively.