I’m working on a little Android app to stream some camera footage (as a series of JPEGs) to my computer. With no processing, the frame buffer receives camera preview images at about 18 fps. When I add in
YuvImage yuv = new YuvImage(data, ImageFormat.NV21, dimensions.width, dimensions.height, null);
yuv.compressToJpeg(new Rect(0, 0, dimensions.width, dimensions.height), 40, out);
the frame rate drops to about 7 fps. So I thought I’d write my own JPEG encoder in C and speed it up a bit. Well I was in for a surprise. I’m now getting 0.4 fps!
So now I need to profile and optimize my C code, but I don’t really know where to begin. I’m using these GCC flags:
-Wall -std=c99 -ffast-math -O3 -funroll-loops
Is there anything I can improve there?
Other than that, my JPEG encoder is just a straight forward implementation. Write header info, write quantization and Huffman tables, then entropy encode the data. The DCT is using AA&N’s method I believe is the fastest way of doing this.
Perhaps there is a problem with the JNI overhead?
I’m allocating the memory in Java using:
frame_buffer = ByteBuffer.allocate(raw_preview_buffer_size).array();
jpeg_buffer = ByteBuffer.allocate(10000000).array();
and then pulling it in with this code (pardon the spaghetti at the moment):
void Java_com_nechtan_limelight_activities_CameraPreview_handleFrame(JNIEnv* env, jobject this, jbyteArray nv21data, jbyteArray jpeg_buffer) {
jboolean isCopyNV21;
jboolean isCopyJPEG;
int jpeg_size = 0;
jbyte* nv21databytes = (*env)->GetByteArrayElements(env, nv21data, &isCopyNV21);
jbyte* jpeg_buffer_bytes = (*env)->GetByteArrayElements(env, jpeg_buffer, &isCopyJPEG);
if (nv21databytes != NULL) {
if (jpeg_buffer_bytes != NULL) {
jpeg_size = compressToJpeg((UCHAR*) nv21databytes, (UCHAR*) jpeg_buffer_bytes, 640, 480);
(*env)->ReleaseByteArrayElements(env, jpeg_buffer, jpeg_buffer_bytes, 0);
(*env)->ReleaseByteArrayElements(env, nv21data, nv21databytes, JNI_ABORT);
}
else {
__android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "JPEG data null!");
}
}
else {
__android_log_print(ANDROID_LOG_DEBUG, DEBUG_TAG, "NV21 data null!");
}
}
Am I doing something inefficient here? What is a good way to profile JNI code?
Other than those things, the only thing I can think of is that I’m going to have to read about NEON and vectorize this stuff. Ugh…
Try using the build in encoder: