Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8119329
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T04:46:27+00:00 2026-06-06T04:46:27+00:00

The code without fission looks like this: int check(int * res, char * map,

  • 0

The code without fission looks like this:

int check(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += map[hash(keys[i])]
    }
    return ret;
}

With fission:

int check(int * res, char * map, int n, int * keys){
    int ret = 0;
    for(int i = 0; i < n; ++i){
        tmp[i] = map[hash(keys[i])];
    }
    for(int i = 0; i < n; ++i){
        res[ret] = i;
        ret += tmp[i];
    }
    return ret;
}

Notes:

  • The bottleneck is map[hash(keys[i])] which accesses memory randomly.

  • normally, it would be if(tmp[i]) res[ret++] = i; to avoid the if, I’m using ret += tmp[i].

  • map[..] is always 0 or 1

The fission version is usually significantly faster and I am trying to explain why. My best guess is that ret += map[..] still introduces some dependency and that prevents speculative execution.

I would like to hear if anyone has a better explanation.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T04:46:29+00:00Added an answer on June 6, 2026 at 4:46 am

    From my tests, I get roughly 2x speed difference between the fused and split loops. This speed difference is very consistent no matter how I tweak the loop.

    Fused: 1.096258 seconds
    Split: 0.562272 seconds
    

    (Refer to bottom for the full test code.)


    Although I’m not 100% sure, I suspect that this is due to a combination of two things:

    1. Saturation of the load-store buffer for memory disambigutation due to the cache misses from map[gethash(keys[i])].
    2. An added dependency in the fused loop version.

    It’s obvious that map[gethash(keys[i])] will result in a cache miss nearly every time. In fact, it is probably enough to saturate the entire load-store buffer.

    Now let’s look at the added dependency. The issue is the ret variable:

    int check_fused(int * res, char * map, int n, int * keys){
        int ret = 0;
        for(int i = 0; i < n; ++i){
            res[ret] = i;
            ret += map[gethash(keys[i])];
        }
        return ret;
    }
    

    The ret variable is needed for address resolution of the the store res[ret] = i;.

    • In the fused loop, ret is coming from a sure cache miss.
    • In the split loop, ret is coming tmp[i] – which is much faster.

    This delay in address resolution of the fused loop case likely causes res[ret] = i to store to clog up the load-store buffer along with map[gethash(keys[i])].

    Since the load-store buffer has a fixed size, but you have double the junk in it:
    You are only able to overlap the cache misses half as much as before. Thus 2x slow-down.


    Suppose if we changed the fused loop to this:

    int check_fused(int * res, char * map, int n, int * keys){
        int ret = 0;
        for(int i = 0; i < n; ++i){
            res[i] = i;    //  Change "res" to "i"
            ret += map[gethash(keys[i])];
        }
        return ret;
    }
    

    This will break the address resolution dependency.

    (Note that it’s not the same anymore, but it’s just to demonstrate the performance difference.)

    Then we get similar timings:

    Fused: 0.487477 seconds
    Split: 0.574585 seconds
    

    Here’s the complete test code:

    #define SIZE 67108864
    
    unsigned gethash(int key){
        return key & (SIZE - 1);
    }
    
    int check_fused(int * res, char * map, int n, int * keys){
        int ret = 0;
        for(int i = 0; i < n; ++i){
            res[ret] = i;
            ret += map[gethash(keys[i])];
        }
        return ret;
    }
    int check_split(int * res, char * map, int n, int * keys, int *tmp){
        int ret = 0;
        for(int i = 0; i < n; ++i){
            tmp[i] = map[gethash(keys[i])];
        }
        for(int i = 0; i < n; ++i){
            res[ret] = i;
            ret += tmp[i];
        }
        return ret;
    }
    
    
    int main()
    {
        char *map = (char*)calloc(SIZE,sizeof(char));
        int *keys =  (int*)calloc(SIZE,sizeof(int));
        int *res  =  (int*)calloc(SIZE,sizeof(int));
        int *tmp  =  (int*)calloc(SIZE,sizeof(int));
        if (map == NULL || keys == NULL || res == NULL || tmp == NULL){
            printf("Memory allocation failed.\n");
            system("pause");
            return 1;
        }
    
        //  Generate Random Data
        for (int i = 0; i < SIZE; i++){
            keys[i] = (rand() & 0xff) | ((rand() & 0xff) << 16);
        }
    
        printf("Start...\n");
    
        double start = omp_get_wtime();
        int ret;
    
        ret = check_fused(res,map,SIZE,keys);
    //    ret = check_split(res,map,SIZE,keys,tmp);
    
        double end = omp_get_wtime();
    
        printf("ret = %d",ret);
        printf("\n\nseconds = %f\n",end - start);
    
        system("pause");
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a code like this: $myvar=$_GET['var']; // a bunch of code without any
How to write this JavaScript code without eval? var typeOfString = eval(typeof + that.modules[modName].varName);
I have this thread. Earlier, I was testing the code without the thread and
The code below create one circle inside the windows form. This code compiled without
Javascript compiles this code without error: function test() { property: true; alert('testing'); } test();
I'm having a leak with this code without being able to find where it's
i would like to write a simple line of code, without resorting to if
This is the code without any attempt to add var nn = 99 to
I set up my code without considering this fact needed to be detected, and
I want to rewrite this code without so many else's, but still keep it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.