This is a small project for testing pixel level manipulation performance of NME for different builds (Windows c++, Flash).
It uses BitmapData.setPixel to modify the pixels one by one (320×240 for every frame). The C++ build runs at 22 FPS, and the flash build around ~100 FPS. Whats the reason for the huge performance drop for the C++ build compared to flash? How could I improve the code to get higher FPS using the C++ build?
Mandelbrot.hx
import nme.display.Sprite;
import nme.display.Bitmap;
import nme.display.BitmapData;
import nme.text.TextField;
import nme.events.Event;
import nme.events.TimerEvent;
import nme.utils.Timer;
import nme.geom.Matrix;
import nme.geom.Rectangle;
import nme.utils.ByteArray;
class Mandelbrot
{
public static function main() : Void
{
new Mandelbrot();
}
public var pixels:Array<Array<Int>>;
public var colorModifier:Int;
private var bitmapData:BitmapData;
private var bigBitmapData:BitmapData;
private var fps:TextField;
private var width:Int;
private var height:Int;
private var matrix:Matrix;
public function new()
{
width = 320; //Std.int(flash.Lib.current.stage.stageWidth/2);
height = 240; //Std.int(flash.Lib.current.stage.stageHeight/2);
var scale:Float = 2;//flash.Lib.current.stage.stageWidth/width;
matrix = new Matrix();
matrix.scale(scale, scale);
var setBitmap:Bitmap = new Bitmap();
bitmapData = new BitmapData( width , height , false , 0x000000 );
bigBitmapData = new BitmapData( nme.Lib.current.stage.stageWidth , nme.Lib.current.stage.stageHeight , false , 0x000000 );
setBitmap.bitmapData = bigBitmapData;
nme.Lib.current.addChild( setBitmap );
var maxIterations:Int = 128;
pixels = new Array();
var beforeTime = nme.Lib.getTimer();
var xtemp;
var iteration;
var x0:Float = 0;
var y0:Float = 0;
for(ix in 0...width) {
pixels[ix] = new Array();
for(iy in 0...height) {
x0 = 0;
y0 = 0;
iteration = 128;
while ( x0*x0 + y0*y0 <= 4 && iteration > 0 )
{
xtemp = x0*x0 - y0*y0 + (ix-14*5000)/50000;
y0 = 2*x0*y0 + (iy-(height/0.6))/50000;
x0 = xtemp;
iteration--;
}
pixels[ix][iy] = iteration;
}
}
var afterTime = nme.Lib.getTimer();
var tf = new TextField();
tf.width = 400;
tf.text = "Generating fractal took "+(afterTime-beforeTime)+" ms";
nme.Lib.current.addChild(tf);
fps = new TextField();
fps.width = 400;
fps.y = 10;
fps.text = "FPS: ";
nme.Lib.current.addChild(fps);
colorModifier = 2;
var timer:haxe.Timer = new haxe.Timer(10);
runLoop();
timer.run = runLoop;
}
public function runLoop() {
var r:Int=0, b:Int=0, g:Int=0;
var pixel:Int = 0;
var beforeTime = nme.Lib.getTimer();
for(iy in 0...height) {
for(ix in 0...width) {
pixel = pixels[ix][iy];
r = pixel + colorModifier;
g = pixel + colorModifier + r;
b = pixel + colorModifier + g;
bitmapData.setPixel(ix, iy, (r<<16 | g<<8 | b));
}
}
bigBitmapData.draw(bitmapData, matrix, null, null, null, false);
var afterTime = nme.Lib.getTimer();
fps.text = "FPS: "+Math.round(1000/(afterTime-beforeTime));
colorModifier += 2;
if(colorModifier > 65530)
colorModifier = 0;
}
}
Mandelbrot.nmml
<?xml version="1.0" encoding="utf-8"?>
<project>
<app
file="Mandelbrot.hx"
title="Mandelbrot sample"
package="org.haxe.nme.mandelbrot"
version="1.0.0"
company="nme"
main="Mandelbrot"
/>
<window
width="640"
height="480"
orientation="landscape"
fps="60"
background="0xffffff"
resizeable="true"
hardware="true"
/>
<classpath name="." />
<haxelib name="nme" />
<ndll name="std" />
<ndll name="regexp" />
<ndll name="zlib" />
<ndll name="nme" haxelib="nme" />
<setenv name="SHOW_CONSOLE"/>
</project>
Look into the
nme.MemoryAPI. The idea is to create aByteArraywith the correct size (or get it from aBitmapData), select it as the current virtual memory space and manipulate its bytes directly.You’ll get an approximately 10x speed boost with Flash and it should be way faster with the CPP target too. Don’t forget to compile in Release mode or method inlining will be disabled and performances will suffer a lot.
Basic usage example (untested code) :
Keep in mind this is a very basic code example. In your case, you’d need to adapt it and actually use a double sized
ByteArraybecause you need to store the iteration count too. Nested loops can be optimized in your main loop and you can avoid a lot of extra index/address computations :And this is it. If you really don’t want to use Alchemy Opcodes on the Flash target, the next fastest way to blit pixels is to use
getVector()/setVector()from theBitmapDataclass. But it’s really not as fast.