Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 954693
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T00:15:27+00:00 2026-05-16T00:15:27+00:00

I have some code that runs fairly well, but I would like to make

  • 0

I have some code that runs fairly well, but I would like to make it run better. The major problem I have with it is that it needs to have a nested for loop. The outer one is for iterations (which must happen serially), and the inner one is for each point particle under consideration. I know there’s not much I can do about the outer one, but I’m wondering if there is a way of optimizing something like:

    void collide(particle particles[], box boxes[], 
        double boxShiftX, double boxShiftY) {/*{{{*/
            int i;
            double nX; 
            double nY; 
            int boxnum;
            for(i=0;i<PART_COUNT;i++) {
                    boxnum = ((((int)(particles[i].sX+boxShiftX))/BOX_SIZE)%BWIDTH+
                        BWIDTH*((((int)(particles[i].sY+boxShiftY))/BOX_SIZE)%BHEIGHT)); 
                        //copied and pasted the macro which is why it's kinda odd looking

                    particles[i].vX -= boxes[boxnum].mX;
                    particles[i].vY -= boxes[boxnum].mY;
                    if(boxes[boxnum].rotDir == 1) {
                            nX = particles[i].vX*Wxx+particles[i].vY*Wxy;
                            nY = particles[i].vX*Wyx+particles[i].vY*Wyy;
                    } else { //to make it randomly pick a rot. direction
                            nX = particles[i].vX*Wxx-particles[i].vY*Wxy;
                            nY = -particles[i].vX*Wyx+particles[i].vY*Wyy;
                    }   
                    particles[i].vX = nX + boxes[boxnum].mX;
                    particles[i].vY = nY + boxes[boxnum].mY;
            }   
    }/*}}}*/

I’ve looked at SIMD, though I can’t find much about it, and I’m not entirely sure that the processing required to properly extract and pack the data would be worth the gain of doing half as many instructions, since apparently only two doubles can be used at a time.

I tried breaking it up into multiple threads with shm and pthread_barrier (to synchronize the different stages, of which the above code is one), but it just made it slower.

My current code does go pretty quickly; it’s on the order of one second per 10M particle*iterations, and from what I can tell from gprof, 30% of my time is spent in that function alone (5000 calls; PART_COUNT=8192 particles took 1.8 seconds). I’m not worried about small, constant time things, it’s just that 512K particles * 50K iterations * 1000 experiments took more than a week last time.

I guess my question is if there is any way of dealing with these long vectors that is more efficient than just looping through them. I feel like there should be, but I can’t find it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T00:15:28+00:00Added an answer on May 16, 2026 at 12:15 am

    I’m not sure how much SIMD would benefit; the inner loop is pretty small and simple, so I’d guess (just by looking) that you’re probably more memory-bound than anything else. With that in mind, I’d try rewriting the main part of the loop to not touch the particles array more than needed:

    const double temp_vX = particles[i].vX - boxes[boxnum].mX;
    const double temp_vY = particles[i].vY - boxes[boxnum].mY;
    
    if(boxes[boxnum].rotDir == 1)
    {
        nX = temp_vX*Wxx+temp_vY*Wxy;
        nY = temp_vX*Wyx+temp_vY*Wyy;
    }
    else
    {
        //to make it randomly pick a rot. direction
        nX =  temp_vX*Wxx-temp_vY*Wxy;
        nY = -temp_vX*Wyx+temp_vY*Wyy;
    }   
    particles[i].vX = nX;
    particles[i].vY = nY;
    

    This has the small potential side effect of not doing the extra addition at the end.


    Another potential speedup would be to use __restrict on the particle array, so that the compiler can better optimize the writes to the velocities. Also, if Wxx etc. are global variables, they may have to get reloaded each time through the loop instead of possibly stored in registers; using __restrict would help with that too.


    Since you’re accessing the particles in order, you can try prefetching (e.g. __builtin_prefetch on GCC) a few particles ahead to reduce cache misses. Prefetching on the boxes is a bit tougher since you’re accessing them in an unpredictable order; you could try something like

    int nextBoxnum = ((((int)(particles[i+1].sX+boxShiftX) /// etc...
    // prefetch boxes[nextBoxnum]
    

    One last one that I just noticed – if box::rotDir is always +/- 1.0, then you can eliminate the comparison and branch in the inner loop like this:

    const double rot = boxes[boxnum].rotDir; // always +/- 1.0
    nX =     particles[i].vX*Wxx + rot*particles[i].vY*Wxy;
    nY = rot*particles[i].vX*Wyx +     particles[i].vY*Wyy;
    

    Naturally, the usual caveats of profiling before and after apply. But I think all of these might help, and can be done regardless of whether or not you switch to SIMD.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some code that raises PropertyChanged events and I would like to be
I have some fairly old code that runs just fine in Excel versions before
I have to call some code in a SharePoint site that runs under the
I have some code that looks like: template<unsigned int A, unsigned int B> int
I've got some code that runs on every (nearly) every admin request but doesn't
I have some jQuery code that runs fine until I add the jQuery UI
I have some code that I want to run in a utility application whenever
I have some code I run against other code. Let say command x runs
I have some code that gives a user id to a utility that then
I have some code that uses the shared gateway pattern to implement an inversion

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.