I have benchmarked several ways to fold a large array of primitives (direct and

Question

0

Asked: June 18, 20262026-06-18T15:55:39+00:00 2026-06-18T15:55:39+00:00

I have benchmarked several ways to fold a large array of primitives (direct and

0

I have benchmarked several ways to fold a large array of primitives (“direct” and with iterators), and the results are disappointing. (Yes, I have done warmup, intermediate GC and many run passes, running JVM in server mode and scalac optimisations are enabled (and debugging info is disabled)).

I think code is too big to post here, so here is link: http://pastebin.com/18dWWBM4
The only method there that runs nearly as good as plain old imperative loop is this not-so-generic hand-written function:

@inline def array_foldl[@specialized A, @specialized B](init: B)(src: Array[A])(fun: (B, A) => B) = {
  var res = init
  var i = 0
  var len = src.length
  while (i < len) {
    res = fun(res, src(i))
    i += 1
  }
  res
}

Other visually nice methods are complete outsiders. Also, using iterator abstractions fails in all cases, with hand-written parody to the standart Iterator called SpecializedIterator being slightly faster. So what’s the problem? Can it be improved somehow? Is there a way to make “fast” iterator, or there is a big problem in the principle itself?
Thanks for attention.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T15:55:40+00:00

The problem is boxing. It takes a lot longer to create an object than to add two numbers, but if you use generic (non-specialized) folds, you have to create an object every time. The problem with just specializing everything is that you make the entire library 100x larger since you need every combination of two primitive parameters (including with non-primitives), plus the original no-type-parameter version. (100x because there are 8 primitives plus Unit plus AnyRef/non-specialized T.) This is untenable, and since there is no readily available alternate solution, the collections are presently unspecialized.

Also, specialization itself is relatively new and thus still has some deficits in its implementation. In particular, you seem to have hit one with SpecializedIterator: the function in foreach doesn’t end up specialized (I collapsed the trait/object thing into a single class to make it easier to track down):

public class Main$SpecializedArrayIterator$mcJ$sp extends Main$SpecializedArrayIterator{
public final void foreach$mcJ$sp(scala.Function1);
  Code:
   0:   aload_0
   1:   invokevirtual   #39; //Method Main$SpecializedArrayIterator.hasNext:()Z
   4:   ifeq    24
   7:   aload_1
   8:   aload_0
   9:   invokevirtual   #14; //Method next$mcJ$sp:()J
   12:  invokestatic    #45; //Method scala/runtime/BoxesRunTime.boxToLong:(J)Ljava/lang/Long;
   15:  invokeinterface #51,  2; //InterfaceMethod scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;
   20:  pop
   21:  goto    0
   24:  return

See the box at line 12, followed by a call to un-specialized Function1? Oops. (The tuple (A, (A,A) => A) used in sum also messes up specialization.) An implementation like this is full speed:

class SpecializedArrayIterator[@specialized A](src: Array[A]) {
  var i = 0
  val l = src.length
  @inline final def hasNext: Boolean = i < l
  @inline final def next(): A = { val res = src(i); i += 1; res }
  @inline final def foldLeft[@specialized B](z: B)(op: (B, A) => B): B = {
    var result = z
    while (hasNext) result = op(result,next)
    result
  }
}

...
measure((new SpecializedArrayIterator[Long](test)).foldLeft(0L)(_ + _))
...

With results like so:

Launched 51298 times in 2000 milliseconds, ratio = 25.649    // New impl
Launched 51614 times in 2000 milliseconds, ratio = 25.807    // While loop

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have benchmarked several ways to fold a large array of primitives (direct and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply