Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7593247
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T21:07:17+00:00 2026-05-30T21:07:17+00:00

I am processing frames in a video and displaying it live (real time). The

  • 0

I am processing frames in a video and displaying it live (real time). The algorithm is fast, but I am wondering if there’s any optimizations that I can do that will make it even more seamless. I don’t know what functions in my algorithm take up the most amount of time, my guess is the sqrt() function because apparently it does some look ups, but i am not sure.

This is my algorithm:

IplImage *videoFrame = cvCreateImage(cvSize(bufferWidth, bufferHeight), IPL_DEPTH_8U, 4);
videoFrame->imageData = (char*)bufferBaseAddress;
int channels = videoFrame->nChannels;
int widthStep = videoFrame->widthStep;
int width = videoFrame->width;
int height = videoFrame->height;

for(int i=0;i<height;i++){

    uchar *col = ((uchar *)(videoFrame->imageData + i*widthStep));

    for(int j=0;j<width;j++){

        double pRed     = col[j*channels + 0];                      
        double pGreen   = col[j*channels + 1];       
        double pBlue    = col[j*channels + 2];       

        double dRed     = green.val[0] - pRed;
        double dGreen   = green.val[1] - pGreen;
        double dBlue    = green.val[2] - pBlue;

        double sDRed    = dRed * dRed;
        double sDGreen  = dGreen * dGreen;
        double sDBlue   = dBlue * dBlue;


        double sum = sDRed + sDGreen + sDBlue;

        double euc = sqrt(sum);
        //NSLog(@"%f %f %f", pRed, pGreen, pBlue);

        if (euc < threshold) {
            col[j*channels + 0] = white.val[0];
            col[j*channels + 1] = white.val[1];
            col[j*channels + 2] = white.val[2];
        }

    }
}

Thanks!

UPDATE
Ok, so what this does is loop throughout every pixel in the image, and calculator the Euclidean distance between the color of the pixel and green color. So, overall this is a green screen algorithm.

I did some benchmarks, and the fps without using this algorithm is 30.0fps. Using this algorithm, it falls down to about 8fps. But, the majority of the for drop comes from col[j*channels + 0]; If the algorithm doesn’t do anything else and use access the array elects, it drops down to about 10fps.

UPDATE 2
Ok this is interesting, I was removing random lines from the stuff inside the double loop to see what causes the bigger overhead and this is what I found: Creating variables on the stack causes HUGE drop in FPS. Consider this example:

for(int i=0;i<height;i++){

    uchar *col = ((uchar *)(data + i*widthStep));

    for(int j=0;j<width;j++){

        double pRed     = col[j*channels + 0];                      
        double pGreen   = col[j*channels + 1];       
        double pBlue    = col[j*channels + 2];       

    }
}

This drops the fps to 11-ish.

Now this on the other hand:

for(int i=0;i<height;i++){

    uchar *col = ((uchar *)(data + i*widthStep));

    for(int j=0;j<width;j++){

        col[j*channels + 0];                      
        col[j*channels + 1];       
        col[j*channels + 2];       

    }
}

doesn’t drop the FPS at all! The FPS stays at a pretty 30.0. Thought I should update this and let you guys know what’s this is the real bottle neck, making variables not he stack. I wonder if I inline everything I might get a pure 30.0fps.

Nvm…maybe the expressions that aren’t assigned to a var aren’t even evaluated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T21:07:18+00:00Added an answer on May 30, 2026 at 9:07 pm

    sqrt is a monotonically increasing function, and you appear to only be using it in a threshold test.

    Due to monotonicity, sqrt(sum) < threshold is equivalent to sum < threshold * threshold (assuming threshold is positive).

    No more expensive square root, and the compiler will move the multiplication outside the loop.


    As a next step, you can remove the expensive multiply j * channels from inside the inner loop. The compiler should be smart enough to do it only once and use the result three times, but it’s still a multiply that the rest of the calculation is dependent on, so hurts pipelining.

    Remember that a multiply is the same as repeated addition? Normally doing more operations is more expensive, but in this case you already have the repetition part, due to the loop. So use:

    for(int j=0;j<width;j++){
        double pRed     = col[0];
        double pGreen   = col[1];
        double pBlue    = col[2];
    
        double dRed     = green.val[0] - pRed;
        double dGreen   = green.val[1] - pGreen;
        double dBlue    = green.val[2] - pBlue;
    
        double sDRed    = dRed * dRed;
        double sDGreen  = dGreen * dGreen;
        double sDBlue   = dBlue * dBlue;
    
    
        double sum = sDRed + sDGreen + sDBlue;
    
        //NSLog(@"%f %f %f", pRed, pGreen, pBlue);
    
        if (sum < threshold * threshold) {
            col[0] = white.val[0];
            col[1] = white.val[1];
            col[2] = white.val[2];
        }
    
        col += channels;
    }
    

    Next, you have expensive conversions between uchar and double. These aren’t needed for a threshold test:

    int j = width;
    do {
        int_fast16_t const pRed   = col[0];
        int_fast16_t const pGreen = col[1];
        int_fast16_t const pBlue  = col[2];
    
        int_fast32_t const dRed   = green.val[0] - pRed;
        int_fast32_t const dGreen = green.val[1] - pGreen;
        int_fast32_t const dBlue  = green.val[2] - pBlue;
    
        int_fast32_t const sDRed   = dRed * dRed;
        int_fast32_t const sDGreen = dGreen * dGreen;
        int_fast32_t const sDBlue  = dBlue * dBlue;
    
        int_fast32_t const sum = sDRed + sDGreen + sDBlue;
    
        //NSLog(@"%f %f %f", pRed, pGreen, pBlue);
    
        if (sum < threshold * threshold) {
            col[0] = white.val[0];
            col[1] = white.val[1];
            col[2] = white.val[2];
        }
    
        col += channels;
    } while (--j);
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm writing an image processing program to perform real time processing of video frames.
I am processing video frames in real time. I loop through each pixel one
I am making an application that shows the video feed in real time and
I am working on a system that will analyze video frames from multiple cameras.
I have a live 16-bit gray-scale video stream that is pushed through a ring-buffer
I'm experimenting with video processing and I'm looking for uncompressed video frames from iPhone.
Processing time doubles as Y goes to the right. Can anybody tell me why?
String processing in C# and VB.NET is easy for me, but understanding how to
I'm trying to make some optimizations in a private video player for Linux aiming
I am doing a project called user initiated real time object tracking system .

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.