Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8493031
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T22:53:42+00:00 2026-06-10T22:53:42+00:00

Here is the extract from the program in question. The matrix img[][] has the

  • 0

Here is the extract from the program in question. The matrix img[][] has the size SIZE×SIZE, and is initialized at:

img[j][i] = 2 * j + i

Then, you make a matrix res[][], and each field in here is made to be the average of the 9 fields around it in the img matrix. The border is left at 0 for simplicity.

for(i=1;i<SIZE-1;i++) 
    for(j=1;j<SIZE-1;j++) {
        res[j][i]=0;
        for(k=-1;k<2;k++) 
            for(l=-1;l<2;l++) 
                res[j][i] += img[j+l][i+k];
        res[j][i] /= 9;
}

That’s all there’s to the program. For completeness’ sake, here is what comes before. No code comes after. As you can see, it’s just initialization.

#define SIZE 8192
float img[SIZE][SIZE]; // input image
float res[SIZE][SIZE]; //result of mean filter
int i,j,k,l;
for(i=0;i<SIZE;i++) 
    for(j=0;j<SIZE;j++) 
        img[j][i] = (2*j+i)%8196;

Basically, this program is slow when SIZE is a multiple of 2048, e.g. the execution times:

SIZE = 8191: 3.44 secs
SIZE = 8192: 7.20 secs
SIZE = 8193: 3.18 secs

The compiler is GCC.
From what I know, this is because of memory management, but I don’t really know too much about that subject, which is why I’m asking here.

Also how to fix this would be nice, but if someone could explain these execution times I’d already be happy enough.

I already know of malloc/free, but the problem is not amount of memory used, it’s merely execution time, so I don’t know how that would help.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T22:53:44+00:00Added an answer on June 10, 2026 at 10:53 pm

    The difference is caused by the same super-alignment issue from the following related questions:

    • Why is transposing a matrix of 512×512 much slower than transposing a matrix of 513×513?
    • Matrix multiplication: Small difference in matrix size, large difference in timings

    But that’s only because there’s one other problem with the code.

    Starting from the original loop:

    for(i=1;i<SIZE-1;i++) 
        for(j=1;j<SIZE-1;j++) {
            res[j][i]=0;
            for(k=-1;k<2;k++) 
                for(l=-1;l<2;l++) 
                    res[j][i] += img[j+l][i+k];
            res[j][i] /= 9;
    }
    

    First notice that the two inner loops are trivial. They can be unrolled as follows:

    for(i=1;i<SIZE-1;i++) {
        for(j=1;j<SIZE-1;j++) {
            res[j][i]=0;
            res[j][i] += img[j-1][i-1];
            res[j][i] += img[j  ][i-1];
            res[j][i] += img[j+1][i-1];
            res[j][i] += img[j-1][i  ];
            res[j][i] += img[j  ][i  ];
            res[j][i] += img[j+1][i  ];
            res[j][i] += img[j-1][i+1];
            res[j][i] += img[j  ][i+1];
            res[j][i] += img[j+1][i+1];
            res[j][i] /= 9;
        }
    }
    

    So that leaves the two outer-loops that we’re interested in.

    Now we can see the problem is the same in this question: Why does the order of the loops affect performance when iterating over a 2D array?

    You are iterating the matrix column-wise instead of row-wise.


    To solve this problem, you should interchange the two loops.

    for(j=1;j<SIZE-1;j++) {
        for(i=1;i<SIZE-1;i++) {
            res[j][i]=0;
            res[j][i] += img[j-1][i-1];
            res[j][i] += img[j  ][i-1];
            res[j][i] += img[j+1][i-1];
            res[j][i] += img[j-1][i  ];
            res[j][i] += img[j  ][i  ];
            res[j][i] += img[j+1][i  ];
            res[j][i] += img[j-1][i+1];
            res[j][i] += img[j  ][i+1];
            res[j][i] += img[j+1][i+1];
            res[j][i] /= 9;
        }
    }
    

    This eliminates all the non-sequential access completely so you no longer get random slow-downs on large powers-of-two.


    Core i7 920 @ 3.5 GHz

    Original code:

    8191: 1.499 seconds
    8192: 2.122 seconds
    8193: 1.582 seconds
    

    Interchanged Outer-Loops:

    8191: 0.376 seconds
    8192: 0.357 seconds
    8193: 0.351 seconds
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Here is an extract from item 56 of the book C++ Gotchas: It's not
I'm traying to extract info from a table made in xml, here is an
I need to extract Asia from this array. How can i do that. Here
Update I'm trying to extract the Tweet Per Hour information from here . However,
I have a string input from which I need to extract simple information, here
Here is the extract code of how to make a confim box when delete,
Here is the extract code of how to make a confim box when delete,
I've written a C program to extract files from a tar archive using libarchive.
I can't figure out how to get the following routes. Here's an extract from
My program reads device paths like /dev/rdisk0 from input and then it looks in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.