Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3598958
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T20:18:27+00:00 2026-05-18T20:18:27+00:00

For simplicity lets assume that I have a vector of N matrices each of

  • 0

For simplicity lets assume that I have a vector of N matrices each of M rows. I am using STL std::accumulate to compute the sum of all the matrices. I pass a binary functor that accepts two matrices (by reference) and returns their sum (by reference). Full disclosure: I am using libstdc++ parallel mode. Inside the functor I loop over the rows individually to compute the sum.

Though each matrix is too large to fit in the cache, a row fits in very nicely. So it would be advantageous to re-order the loops so that the outer loop indexes over the M rows and the inner one over the N matrices. In addition to defining the functor inline, is there anything else I can do to encourage such a cross-function-boundary loop re-ordering. I can of course restructure the code, but I would ideally like to keep the simple structure that the use of STL algorithms afford. If there is something that is gcc specific I wouldnt mind that either.

I am not actually dealing with matrices, that was just an example, but the same problem structure applies. The main issue is that of performance. Explaining the actual scenario would be too cumbersome, but the core problem is this: STL’s accumulate entails an ordering among the nested loops that isnt very cache friendly because it tries to complete the addition of two objects before moving on to the next object. A single object is too big to be held in cache, but parts of it can be. So the execution can be sped up if one computes the ‘additions’ one ‘part’ at a time (over all objects). Hand reordering the loops leads to substantial improvement in FLOPS. But I would ideally like the compiler to do the re-ordering so that I can code at the STL level (as far as possible). So I am looking for tricks to do this.

  • 1 1 Answer
  • 3 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T20:18:28+00:00Added an answer on May 18, 2026 at 8:18 pm
    class Matrix;
    class Row;
    struct SumNRow {
      int _rowidx;
    //  Row _tempRow; //For return by reference left out for simplicity
      SumNRow(int iRowIdx): _rowIdx(iRowIdx) {}
      Row operator(const Matrix & iMarix1, const Matrix iMatrix2) {
        return iMarix1[_rowIdx] + iMatrix2[_rowIdx];
      }
    };
    
    template<class MatrixIterator>
    void sum(const MatrixIterator & iMarixStart, const MatrixIterator & iMatrixEnd, Matrix & oMarix) {
      for (int i = 0; i < iMarixStart->rowCount(); ++i) {
        oMarix[i]=std::accumulate(iMarixStart, iMatrixEnd, SumNRow(i));
      }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For simplicity lets assume that I have a vector of N matrices each of
I have custom control written in Java. For the sake of simplicity lets assume
I have the following HQL query and for simplicity sake lets assume the mappings
I'm currently designing a Django based site. For simplicity lets assume that it is
I have a dilemma. Let's assume(for simplicity's sake) I have four tables, with different
Let's say we have interface window_creator that responsible for creation of windows. For simplicity
I am using Oracle SQL and I want to group some different rows that
Lets say that in a browser based game, completing some action (for simplicity lets
for the sake of simplicity let's assume that I'm making a simple Pong clone
I have a CCNode with a certain rotation. For the sake of simplicity lets

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.