So I was messing around with some easy problems to get better at scala and I wrote the following program to calculate primes using an Eratosthenes’s sieve. When I bump up the number of primes to find, I noticed that my cpu would max out during the calculation. Now I have no clue why it’s using more than 1 core and I was afraid it would muck up the answer but it appears to be correct upon multiple runs so it must not be. I’m not using .par anywhere and most all of my logic is in for-comprehensions.
Edit: I’m using scala 2.9.1
object Main {
val MAX_PRIME = 10000000
def main(args: Array[String]) {
println("Generating array")
val primeChecks = scala.collection.mutable.ArrayBuffer.fill(MAX_PRIME + 1)(true)
primeChecks(0) = false
println("Finding primes")
for (
i ← 2 to MAX_PRIME if primeChecks(i);
j ← i * 2 to MAX_PRIME by i
) primeChecks(j) = false
println("Filtering primes")
val primes = for { (status, num) ← primeChecks.zipWithIndex if status } yield num
println("Found %d prime numbers!".format(primes.length))
println("Saving the primes")
val formatter = new java.util.Formatter("primes.txt", "UTF-8")
try {
for (prime ← primes)
formatter.format("%d%n", prime.asInstanceOf[Object])
}
finally {
try { formatter.close } catch { case _ ⇒ }
}
}
}
Edit 2: You can use the following snippet in a REPL to get the multi-threading behavior so therefore it has to be because of the for-comprehension (at least in scala 2.9.1).
val max = 10000000
val t = scala.collection.mutable.ArrayBuffer.fill(max + 1)(true)
for (
i <- 2 to max if t(i);
j <- i * 2 to max by i
) t(j) = false
It’s not your code that’s using multiple threads, it’s the JVM that is. What you are seeing is the GC kicking in. If I increase MAX_PRIME to 1000000000 and give it 6Gb of Java stack to play with I can see a steady-state of 100% of 1 CPU and about 4Gb mem. Every so often the GC kicks in and it then uses 2 CPUs. The following Java stack trace (pruned for clarity) show what’s running inside the JVM:
There’s only one thread (main) running Scala code, all the others are internal JVM ones. Note in particular there’s 4 GC threads in this case – that’s because I’m running this on a 4-way machine and by default the JVM will allocate 1 GC thread per core – the exact setup will depend on the particular mix of platform, JVM and command-line flags that are used.
If you want to understand the details (It’s complicated!), the following links should get you started: