I’m trying to use scala parallel collections to implement some cpu-intensive
task, I’ve wanted to abstract the way the algorithm can be executed
(sequentially, parallel or even distributed), but the code dosn’t work as I
would suspect and I have no idea what am I doing wrong.
The way I wanted to abstract this problem is mocked below:
// just measures time a block of code runs
def time(block: => Unit) : Long = {
val start = System.currentTimeMillis
block
val stop = System.currentTimeMillis
stop - start
}
// "lengthy" task
def work = {
Thread.sleep(100)
println("done")
1
}
import scala.collection.GenSeq
abstract class ContextTransform {
def apply[T](genSeq: GenSeq[T]): GenSeq[T]
}
object ParContextTransform extends ContextTransform {
override def apply[T](genSeq: GenSeq[T]): GenSeq[T] = genSeq.par
}
// this works as expected
def callingParDirectly = {
val range = (1 to 10).par
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
// this doesn't
def callingParWithContextTransform(contextTransform: ContextTransform) = {
val range = contextTransform(1 to 10)
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
The result from the interpreter:
scala> time(callingParDirectly)
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res20: Long = 503
scala> time(callingParWithContextTransform(ParContextTransform))
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res21: Long = 1002
My first bet was that the collection doesn’t split properly and the println’s of
“done” indeed suggest that… but the above code works well if I don’t yield
anything (just run the work method).
I can’t understand why the callingParWithContextTransform method doesn’t work
like callingParDirectly; what am I missing?
Possible culprit: SI-4843.