I was trying out the Java ForkJoin framework and wrote a simple test program that sets the pixels of an image to random colors. E.g. it generates pseudo-noise.
But while testing performance I found that it’s actually faster to run single threaded than to run it with multiple threads. I make it run single threaded by passing a high threshold.
This is the class worker class:
public class Noise extends RecursiveAction {
private BufferedImage image;
private int xMin;
private int yMin;
private int xMax;
private int yMax;
private int threshold = 2000000; // max pixels per thread
public Noise(BufferedImage image, int xMin, int yMin, int xMax, int yMax, int threshold) {
this.image = image;
this.xMin = xMin;
this.yMin = yMin;
this.xMax = xMax;
this.yMax = yMax;
this.threshold = threshold;
}
public Noise(BufferedImage image, int xMin, int yMin, int xMax, int yMax) {
this.image = image;
this.xMin = xMin;
this.yMin = yMin;
this.xMax = xMax;
this.yMax = yMax;
}
@Override
protected void compute() {
int ppt = (xMax - xMin) * (yMax - yMin); // pixels pet thread
if(ppt > threshold) {
// split
int verdeling = ((xMax - xMin) / 2) + xMin;
invokeAll(new Noise(image, xMin, yMin, verdeling, yMax),
new Noise(image, verdeling+1, yMin, xMax, yMax));
}
else {
// execute!
computeDirectly(xMin, yMin, xMax, yMax);
}
}
private void computeDirectly(int xMin, int yMin, int xMax, int yMax) {
Random generator = new Random();
for (int x = xMin; x < xMax; x++) {
for (int y = yMin; y < yMax; y++) {
//image.setPaint(new Color(generator.nextInt()));
int rgb = generator.nextInt();
int red = (rgb >> 16) & 0xFF;
int green = (rgb >> 8) & 0xFF;
int blue = rgb & 0xFF;
red = (int) Math.round((Math.log(255L) / Math.log((double) red)) * 255);
green = (int) Math.round((Math.log(255L) / Math.log((double) green)) * 255);
blue = (int) Math.round((Math.log(255L) / Math.log((double) blue)) * 255);
int rgbSat = red;
rgbSat = (rgbSat << 8) + green;
rgbSat = (rgbSat << 8) + blue;
image.setRGB(x, y, rgbSat);
}
}
Graphics2D g2D = image.createGraphics();
g2D.setPaint(Color.RED);
g2D.drawRect(xMin, yMin, xMax-xMin, yMax-yMin);
}
}
When generating a 6000 * 6000 image the results are:
Single thread: 9.4sec @ 25% CPU load
Multi thread: 16.5sec @ 80%-90% CPU load
(Core2quad Q9450)
Why is the multi-threaded version slower?
How do I fix this?
First of all, F/J is a niche product. If you don’t have a HUGE array and process it as a DAG then you’re using the wrong product. Sure, F/J can make use of multiple processors, but so can just using a simple multi-threaded approach without all the overhead of F/J.
Try using four threads and just give each a quarter of the work directly.
This is the way F/J was meant to be used:
When you don’t walk down the leaves of a structured tree, then all bets are off.