Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8829497
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T07:50:46+00:00 2026-06-14T07:50:46+00:00

I have the following function double single_channel_add(int patch_top_left_row, int patch_top_left_col, int image_hash_key, Mat* preloaded_images,

  • 0

I have the following function

double single_channel_add(int patch_top_left_row, int patch_top_left_col, 
        int image_hash_key, 
        Mat* preloaded_images,
        int* random_values){

    int first_pixel_row = patch_top_left_row + random_values[0];
    int first_pixel_col = patch_top_left_col + random_values[1];
    int second_pixel_row = patch_top_left_row + random_values[2];
    int second_pixel_col = patch_top_left_col + random_values[3];

    int channel = random_values[4];

    Vec3b* first_pixel_bgr = preloaded_images[image_hash_key].ptr<Vec3b>(first_pixel_row, first_pixel_col);
    Vec3b* second_pixel_bgr = preloaded_images[image_hash_key].ptr<Vec3b>(second_pixel_row, second_pixel_col);

    return (*first_pixel_bgr)[channel] + (*second_pixel_bgr)[channel];
}

Which is called about one and a half million times with different values for patch_top_left_row and patch_top_left_col. This takes about 2 seconds to run, now when I change the calculation of first_pixel_row etc to not use the arguments but hard coded numbers instead (shown below), the thing runs sub second and I don’t know why. Is the compiler doing something smart here ( I am using gcc cross compiler)?

double single_channel_add(int patch_top_left_row, int patch_top_left_col, 
        int image_hash_key, 
        Mat* preloaded_images,
        int* random_values){

        int first_pixel_row = 5 + random_values[0];
        int first_pixel_col = 6 + random_values[1];
        int second_pixel_row = 8 + random_values[2];
        int second_pixel_col = 10 + random_values[3];
            int channel = random_values[4];

    Vec3b* first_pixel_bgr = preloaded_images[image_hash_key].ptr<Vec3b>(first_pixel_row, first_pixel_col);
    Vec3b* second_pixel_bgr = preloaded_images[image_hash_key].ptr<Vec3b>(second_pixel_row, second_pixel_col);

    return (*first_pixel_bgr)[channel] + (*second_pixel_bgr)[channel];
}

EDIT:

I have pasted the assembly from the two versions of the function
using arguments: http://pastebin.com/tpCi8c0F
using constants: http://pastebin.com/bV0d7QH7

EDIT:

After compiling with -O3 I get the following clock ticks and speeds:

using arguments: 1990000 ticks and 1.99seconds
using constants: 330000 ticks and 0.33seconds

EDIT:
using argumenst with -03 compilation: http://pastebin.com/fW2HCnHc
using constant with -03 compilation: http://pastebin.com/FHs68Agi

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T07:50:47+00:00Added an answer on June 14, 2026 at 7:50 am

    On the x86 platform there are instructions that very quickly add small integers to a register. These instructions are the lea (aka ‘load effective address’) instructions and they are meant for computing address offsets for structures and the like. The small integer being added is actually part of the instruction. Smart compilers know that these instructions are very quick and use them for addition even when addresses are not involved.

    I bet if you changed the constants to some random value that was at least 24 bits long that you would see much of the speedup disappear.

    Secondly those constants are known values. The compiler can do a lot to arrange for those values to end up in a register in the most efficient way possible. With an argument, unless the argument is passed in a register (and I think your function has too many arguments for that calling convention to be used) the compiler has no choice but to fetch the number from memory using a stack offset load instruction. That isn’t a particularly slow instruction or anything, but with constants the compiler is free to do something much faster than may involve simply fetching the number from the instruction itself. The lea instructions are simply the most extreme example of this.

    Edit: Now that you’ve pasted the assembly things are much clearer

    In the non-constant code, here is how the add is done:

    addl    -68(%rbp), %eax
    

    This fetches a value from the stack an offset -68(%rpb) and adds it to the %eax% register.

    In the constant code, here is how the add is done:

    addl    $5, %eax
    

    and if you look at the actual numbers, you see this:

    0138 83C005
    

    It’s pretty clear that the constant being added is encoded directly into the instruction as a small value. This is going to be much faster to fetch than fetching a value from a stack offset for a number of reasons. First it’s smaller. Secondly, it’s part of an instruction stream with no branches. So it will be pre-fetched and pipelined with no possibility for cache stalls of any kind.

    So while my surmise about the lea instruction wasn’t correct, I was still on the right track. The constant version uses a small instruction specifically oriented towards adding a small integer to a register. The non-constant version has to fetch an integer that may be of indeterminate size (so it has to fetch ALL the bits, not just the low ones) from a stack offset (which adds in an additional add to compute the actual address from the offset and stack base address).

    Edit 2: Now that you’ve posted the -O3 results

    Well, it’s much more confusing now. It’s apparently inlined the function in question and it jumps around a whole ton between the code for the inlined function and the code for the calling function. I’m going to need to see the original code for the whole file to make a proper analysis.

    But what I strongly suspect is happening now is that the unpredictability of the values retrieved from get_random_number_in_range is severely limiting the optimization options available to the compiler. In fact, it looks like in the constant version it doesn’t even bother to call get_random_number_in_range because the value is tossed out and never used.


    I’m assuming that the values of patch_top_left_row and patch_top_left_col are generated in a loop somewhere. I would push this loop into this function. If the compiler knows the values are generated as part of a loop, there are a very large number of optimization options open to it. In the extreme case it could use some of the SIMD instructions that are part of the various SSE or 3dnow! instruction suites to make things a whole ton faster than even the version you have that uses constants.

    The other option would be to make this function inline, which would hint to the compiler that it should try inserting it into the loop in which it’s called. If the compiler takes the hint (this function is a bit largish, so the compiler might not) it will have much the same effect as if you’d stuffed the loop into the function.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following function: func fitrange(a, x, b int) int { if a
I have a Policy class with the following function: double Policy::meanResponse(); Suppose I have
I have the following function that is used to round a double value in
I have the following function: std::vector<double>residuals; std::cout << Print_res(std::cout); std::ostream& Print_res(std::ostream& os) const {
Let's say I have the following function: Function myFunction(j As Integer) As Double myFunction
I have following function in one Activity public void AppExit() { Editor edit =
In my MVC application in Controller i have following function to add and focus
have the following function on my collection : getFiltered: function (status, city) { return
I have the following function set up to sequentially fade in divs (with class
I have the following function to remove a DOM element div, $('#emDiv').on(click, ':button[data-emp-del=true]', function

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.