Consider the following Set benchmark:
import scala.collection.immutable._
object SetTest extends App {
def time[a](f: => a): (a,Double) = {
val start = System.nanoTime()
val result: a = f
val end = System.nanoTime()
(result, 1e-9*(end-start))
}
for (n <- List(1000000,10000000)) {
println("n = %d".format(n))
val (s2,t2) = time((Set() ++ (1 to n)).sum)
println("sum %d, time %g".format(s2,t2))
}
}
Compiling and running produces
tile:scalafab% scala SetTest
n = 1000000
sum 1784293664, time 0.982045
n = 10000000
Exception in thread "Poller SunPKCS11-Darwin" java.lang.OutOfMemoryError: Java heap space
...
I.e., Scala is unable to represent a set of 10 million Ints on a machine with 8 GB of memory. Is this expected behavior? Is there some way to reduce the memory footprint?
Generic immutable sets do take a lot of memory. The default is for only 256M heap, which leaves only 26 bytes per object. The hash trie for immutable sets generally takes
one to two hundred bytes per objectan extra 60 or so bytes per element. If you add-J-Xmx2Gon your command line to increase the heap space to 2G, you should be fine.(This level of overhead is one reason why there are bitsets, for example.)