I have some code that will stream video from a camera at 720p and 24fps. I am trying to capture this stream in code and eventually create a video of it by throwing together compressed jpegs into mjpeg or the like. The issue I’m having is that this overall code is not fast enough to create something at 24 fps or .04 seconds per image.
using
Stopwatch();
I found out that the interior for loop takes .000000000022 seconds per loop.
The exterior for loop takes .0000077 seconds to complete per loop.
and I found that the entire function from start to image save runs .21 seconds per run.
calculations from interior loop to complete an image:
.000000000022 x 640 = .000000001408 seconds
.000000001408 x 360 = .00000050688 seconds
calculation from exterior loop to complete an image:
.0000077 x 360 = .002772 seconds
If i could create an image relating to those times i would be set, but the code running the overall code takes .21 seconds to complete all of the code
temp_byte1 = main_byte1;
temp_byte2 = main_byte2;
timer1.Reset();
timer1.Start();
Bitmap mybmp = new Bitmap(1280, 720);
BitmapData BPD = mybmp.LockBits(new Rectangle(0, 0, 1280, 720), ImageLockMode.WriteOnly, mybmp.PixelFormat);
IntPtr xptr = BPD.Scan0;
IntPtr yptr = BPD.Scan0;
yptr = new IntPtr( yptr.ToInt64() + (1280 * 720 * 2));
int bytes = Math.Abs(BPD.Stride);
byte[][] rgb = new byte[720][];
int Y1, Y2, Y3, Y4, Y5, Y6, Y7, Y8;
int U1, U2, V1, V2, U3, U4, V3, V4;
for (int one = 0; one < 360; one++)
{
timer2.Reset();
timer2.Start();
rgb[one] = new byte[bytes];
rgb[360 + one] = new byte[bytes];
for (int two = 0; two < 640; two++)
{
timer3.Reset();
timer3.Start();
U1 = temp_byte1[one * 2560 + 4 * two + 0];
Y1 = temp_byte1[one * 2560 + 4 * two + 1];
V1 = temp_byte1[one * 2560 + 4 * two + 2];
Y2 = temp_byte1[one * 2560 + 4 * two + 3];
U2 = temp_byte2[one * 2560 + 4 * two + 0];
Y3 = temp_byte2[one * 2560 + 4 * two + 1];
V2 = temp_byte2[one * 2560 + 4 * two + 2];
Y4 = temp_byte2[one * 2560 + 4 * two + 3];
RGB_Conversion(Y1, U1, V1, two * 8 + 0, rgb[one]);
RGB_Conversion(Y2, U1, V1, two * 8 + 4, rgb[one]);
RGB_Conversion(Y3, U2, V2, two * 8 + 0, rgb[(360 + one)]);
RGB_Conversion(Y4, U2, V2, two * 8 + 4, rgb[(360 + one)]);
timer3.Stop();
timer3_[two] = timer3.Elapsed;
}
Marshal.Copy(rgb[one], 0, xptr, 5120);
xptr = new IntPtr(xptr.ToInt64() + 5120);
Marshal.Copy(rgb[(360 + one)], 0, yptr, 5120);
yptr = new IntPtr(yptr.ToInt64() + 5120);
timer2.Stop();
timer2_[one] = timer2.Elapsed;
}
mybmp.UnlockBits(BPD);
mybmp.Save(GetDateTimeString("IP Pictures") + ".jpg", ImageFormat.Jpeg);
the code works and it converts yuv422 incoming array of bytes into a full size jpeg but cant understand why there is such a discrepancy between the speed of the for loops and the entire code
I moved the
byte[][]rgb = new byte[720];
rgb[x] = new byte[bytes];
to a global that gets init at program startup instead of each function call/run no measurable increase in speed.
UPDATE
RGB Conversion: takes in YUV and converts it to RGB and puts it in the global array holding the values
public void RGB_Conversion(int Y, int U, int V, int MULT, byte[] rgb)
{
int C,D,E;
int R,G,B;
// create the params for rgb conversion
C = Y - 16;
D = U - 128;
E = V - 128;
//R = clamp((298 x C + 409 x E + 128)>>8)
//G = clamp((298 x C - 100 x D - 208 x E + 128)>>8)
//B = clamp((298 x C + 516 x D + 128)>>8)
R = (298 * C + 409 * E + 128)/256;
G = (298 * C - 100 * D - 208 * E + 128)/256;
B = (298 * C + 516 * D + 128)/256;
if (R > 255)
R = 255;
if (R < 0)
R = 0;
if (G > 255)
G = 255;
if (G < 0)
G = 0;
if (B > 255)
B = 255;
if (B < 0)
B = 0;
rgb[MULT + 3] = 255;
rgb[MULT + 0] = (byte)B;
rgb[MULT + 1] = (byte)G;
rgb[MULT + 2] = (byte)R;
}
Firstly
You need to remove the Start/Stop and stopwatch business from the inside of the loop
Resetting the stopwatch 640x in a tight loop is going to skew the figures. Better use a profiler or measure coarse grained performance.
Also, the presence of these statements might prevent compiler optimizations (loop tiling and loop unrolling look to be very good candidates here, but the JITter might not be able to use them, as the registers get clobbered to call stopwatch functions…
Data structures:
I have a feeling that you should be able to use a ‘flat’ data structure, instead of newing up all the jagged arrays there. That said, I don’t know what API you are feeding it into, and I haven’t concetrated a lot on it.
I do feel that making
RGB_Conversion‘just‘ return the RGB parts instead of letting it write into an array might really give the compiler an edge to optimize things.Other thoughts:
Look into
RGB_Conversion(where/how is it defined?). Perhaps you can pull it inline.use an
uncheckedblock to prevent all the array index manipulations to check for overflowconsider using /unsafe code (here) to avoid bounds checking