I have benchmarked several ways to fold a large array of primitives (“direct” and with iterators), and the results are disappointing. (Yes, I have done warmup, intermediate GC and many run passes, running JVM in server mode and scalac optimisations are enabled (and debugging info is disabled)).
I think code is too big to post here, so here is link: http://pastebin.com/18dWWBM4
The only method there that runs nearly as good as plain old imperative loop is this not-so-generic hand-written function:
@inline def array_foldl[@specialized A, @specialized B](init: B)(src: Array[A])(fun: (B, A) => B) = {
var res = init
var i = 0
var len = src.length
while (i < len) {
res = fun(res, src(i))
i += 1
}
res
}
Other visually nice methods are complete outsiders. Also, using iterator abstractions fails in all cases, with hand-written parody to the standart Iterator called SpecializedIterator being slightly faster. So what’s the problem? Can it be improved somehow? Is there a way to make “fast” iterator, or there is a big problem in the principle itself?
Thanks for attention.
The problem is boxing. It takes a lot longer to create an object than to add two numbers, but if you use generic (non-specialized) folds, you have to create an object every time. The problem with just specializing everything is that you make the entire library 100x larger since you need every combination of two primitive parameters (including with non-primitives), plus the original no-type-parameter version. (100x because there are 8 primitives plus
UnitplusAnyRef/non-specializedT.) This is untenable, and since there is no readily available alternate solution, the collections are presently unspecialized.Also, specialization itself is relatively new and thus still has some deficits in its implementation. In particular, you seem to have hit one with
SpecializedIterator: the function inforeachdoesn’t end up specialized (I collapsed the trait/object thing into a single class to make it easier to track down):See the box at line 12, followed by a call to un-specialized Function1? Oops. (The tuple
(A, (A,A) => A)used insumalso messes up specialization.) An implementation like this is full speed:With results like so: