So I’ve been working with parallel collections in Scala for a graph project I’m working on, I’ve got the basics of the graph class defined, it is currently using a scala.collection.mutable.HashMap where the key is Int and the value is ListBuffer[Int] (adjacency list). (EDIT: This has since been change to ArrayBuffer[Int]
I had done a similar thing a few months ago in C++, with a std::vector<int, std::vector<int> >.
What I’m trying to do now is run a metric between all pairs of vertices in the graph, so in C++ I did something like this:
// myVec = std::vector<int> of vertices
for (std::vector<int>::iterator iter = myVec.begin(); iter != myVec.end(); ++iter) {
for (std::vector<int>::iterator iter2 = myVec.begin();
iter2 != myVec.end(); ++iter2) {
/* Run algorithm between *iter and *iter2 */
}
}
I did the same thing in Scala, parallelized, (or tried to) by doing this:
// vertexList is a List[Int] (NOW CHANGED TO Array[Int] - see below)
vertexList.par.foreach(u =>
vertexList.foreach(v =>
/* Run algorithm between u and v */
)
)
The C++ version is clearly single-threaded, the Scala version has .par so it’s using parallel collections and is multi-threaded on 8 cores (same machine). However, the C++ version processed 305,570 pairs in a span of roughly 3 days, whereas the Scala version so far has only processed 23,573 in 17 hours.
Assuming I did my math correctly, the single-threaded C++ version is roughly 3x faster than the Scala version. Is Scala really that much slower than C++, or am I completely mis-using Scala (I only recently started – I’m about 300 pages into Programming in Scala)?
Thanks!
-kstruct
EDIT To use a while loop, do I do something like..
// Where vertexList is an Array[Int]
vertexList.par.foreach(u =>
while (i <- 0 until vertexList.length) {
/* Run algorithm between u and vertexList(i) */
}
}
If you guys mean use a while loop for the entire thing, is there an equivalent of .par.foreach for whiles?
EDIT2 Wait a second, that code isn’t even right – my bad. How would I parallelize this using while loops? If I have some var i that keeps track of the iteration, then wouldn’t all threads be sharing that i?
From your comments, I see that your updating a shared mutable
HashMapat the end of each algorithm run. And if you’re randomizing your walks, a sharedRandomis also a contention point.I recommend two changes:
.mapand.flatMapto return an immutable collection instead of modifying a shared collection.ThreadLocalRandom(from either Akka or Java 7) to reduce contention on the random number generatorpVertexListandvertexListin the code below.Something like this:
The value
allResultwill be aParVector[((Int, Int), Int)]. You may call.toMapon it to convert that into aMap.