Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8493851
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T23:08:05+00:00 2026-06-10T23:08:05+00:00

I was wondering if there’s a neater (or better yet, more efficient), method of

  • 0

I was wondering if there’s a neater (or better yet, more efficient), method of summing values of a vector/(asymmetric) matrix (a matrix having structure like symmetry, could of course be exploited in looping, but not that pertinent to my question) pointed by a collection of indices. Basically this code could be used to calculate, say, a cost of a route through a 2D matrix. I’m looking for a way to utilize CPU, not GPU.

Here’s some relevant code, the one I’m more interested is the first case. I was thinking it’s possible to use std::accumulate with a lambda to capture the indices vector, but then I got wondering, if there’s already a neater way, perhaps with some other operator. Not a “real problem” as looping is quite clear for my tastes too, but in hunt for the super-neat or more efficient on-liner…

template<typename out_type>
out_type sum(std::vector<float> const& matrix, std::vector<int> const& indices)
{
    out_type cost = 0;
    for(decltype(indices.size()) i = 0; i < indices.size() - 1; ++i) 
    {
        const int index = indices.size() * indices[i] + indices[i + 1];
        cost += matrix[index];
    }

    const int index = indices.size() * indices[indices.size() - 1] + indices[0];
    cost += matrix[index];

    return cost;
}

template<typename out_type>
out_type sum(std::vector<std::vector<float>> const& matrix, std::vector<int> const& indices)
{
    out_type cost = 0;
    for(decltype(indices.size()) i = 0; i < indices.size() - 1; i++) 
    {
        cost += matrix[indices[i]][indices[i + 1]];
    }
    cost += matrix[indices[indices.size() - 1]][indices[0]];

    return cost;
}

Oh, and PPL/TBB are fair game too.

Edit

As an afterthought and as commented to John, would there be a place to employ std::common_type in the calculation as the input and output types may differ? This is a bit of hand-waving and more like learning techniques and libraries. A form of code kata, if you will.

Edit 2

Now, there’s one option to make the loops faster, explained in blog writing How to process a STL vector using SSE code by a blogger theowl84. The code uses __m128 directly, but I wonder if there’s something in DirectXMath library too.

Edit 3

Now, after writing some concrete code, I found std::accumulate wouldn’t get me far. Or at least I couldn’t find a way to do the [indices[i + 1] part in matrix[indices[i]][indices[i + 1]]; in a neat way, as std::accumulate itself gives access to only the current value and the sum. In that light, it looks like novelocrat’s approach would be the most fruitful one.

DeadMG proposed using parallel_reduce with associativity caveats, further commented by novelocrat. I didn’t go about seeing if I could use parallel_reduce, as the interface looked somewhat cumbersome for quick trying. Other than that, even though my code executes serially, it would suffer from the same floating some issues as the parallel reduction version. Though the parallel version would/could be (much) more unpredictable with than serial version, I think.

This goes somewhat tangential, but it may be of interest to some stumbling here, and to those of whom have read this far, may be (very) interested on article Wandering Precision in The NAG blog, which details some intricanciens even introduced by hardware instruction re-ordering! Then there are some ruminations about this very issue in distributed setting in #AltDevBlogADay Synchronous RTS Engines and a Tale of Desyncs. Also, ACCU (the general mailing list is excellent, by the way, and it’s free to join) features several articles (e.g. this) on floating point accuracy. A tangential to tangential, I found Fernando Cacciola’s Robustness issues in geometric computing to be a good article to read, originally from ACCU mailing list.

And then then the std::common_type. I couldn’t find usage for that. If I had two different types as parameters, then the return value could/should be decided by std::common_type. Perhaps more pertinent is std::is_convertible with static_assert to make sure the desired result type is convertible from the argument types (with a clean error message). Other than that, I can only make up a check that the return value/intermediate calculation value accurracy is sufficient to represent the result of summation without overflows and things like that, but I haven’t come across a standard facility for that.

That about that, I think, ladies and gentlemen. I enjoyed myself, I hope those reading this got something out of this too.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T23:08:07+00:00Added an answer on June 10, 2026 at 11:08 pm

    You could produce an iterator that takes matrix and indices and yields the appropriate values.

    class route_iterator
    {
      vector<vector<float>> const& matrix;
      vector<int> const& indices;
      int i;
    
    public:
      route_iterator(vector<vector<float>> const& matrix_, vector<int> const& indices_,
                     int begin = 0)
      : matrix(matrix_), indices(indices_), i(begin)
      { }
      float operator*() {
        return matrix[indices[i]][indices[(i + 1) % indices.size()]];
      }
      route_iterator& operator++() {
        ++i;
        return *this;
      }
    };
    

    Then your accumulate runs from route_iterator(matrix, indices) to route_iterator(matrix, indices, indices.size()).

    Admittedly, though, this sequentializes without a smart compiler turning it into something parallel. What you really want are parallel map and fold (accumulate) operations.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Wondering whether there is an efficient way to add an item to Java's ArrayList
HI there I was wondering if there is a better way of testing that
Wondering if there is a better option than a wcf callback. When processing some
I'm renovating a legacy Java Servlet webapp. I'm wondering there is a better way
Just wondering if there is a better way to write the following PL/SQL piece
Just wondering if there's a preferred method or Best Practice for storing Updated By
Wondering if there is any way to get the lambda expressions that result from
Wondering if there is a good way to generate temporary URLs that expire in
Wondering if there are any well informed Linux gurus here who can answer a
Wondering if there is any tool that can help me to detect a pronoun's

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.