This code was taken from the book “Haskell Road to Logic, Math and Programming”. It implements sieve of eratosthenes algorithm and solves Project Euler Problem 10.
sieve :: [Integer] -> [Integer]
sieve (0 : xs) = sieve xs
sieve (n : xs) = n : sieve (mark xs 1 n)
where
mark :: [Integer] -> Integer -> Integer -> [Integer]
mark (y:ys) k m | k == m = 0 : (mark ys 1 m)
| otherwise = y : (mark ys (k+1) m)
primes :: [Integer]
primes = sieve [2..]
-- Project Euler #10
main = print $ sum $ takeWhile (< 2000000) primes
Actually it runs even slower then the naive prime test.
Can someone explain this behaivour?
I suspect it has something to do with iterating each element in the list in the mark function.
Thanks.
You are building up a quadratic number of unevaluated thunks using this algorithm. The algorithm relies on laziness so heavily, that this also is the reason why it doesn’t scale.
Let’s walk through how it works, which hopefully should make the problem apparent. For simplicitly, let’s say that we want to
printthe elements ofprimesad infinitum, i.e. we want to evaluate each cell in the list one after the other.primesis defined as:Since 2 isn’t 0, the second definition of
sieveapplies, and 2 is added to the list of primes, and the rest of the list is an unevaluated thunk (I usetailinstead of the pattern matchn : xsinsieveforxs, sotailisn’t actually being called, and doesn’t add any overhead in the code below;markis actually the only thunked function):Now we want the second
primeselement. So, we walk through the code (exercise for the reader) and end up with:Same procedure again, we want to evaluate the next prime…
This is starting to look like LISP, but I digress… Starting to see the problem? For each element in the
primeslist, an increasingly large thunk of stacks ofmarkapplications have to be evaluated. In other words, for each element in the list, there has to be a check for whether that element is marked by any of the preceding primes, by evaluating eachmarkapplication in the stack. So, forn~=2000000, the Haskell runtime has to call functions resulting in a call stack with a depth of about … I don’t know, 137900 (let n = 2e6 in n / log ngives a lower bound)? Something like that. This is probably what causes the slow-down; maybevacuumcan tell you more (I don’t have a computer with both Haskell and a GUI right now).The reason why the sieve of Eratosthenes works in languages like C is that:
n, resulting in no call stack overheads at all.