I’m performing matrix multiplication with this simple algorithm. To be more flexible I used

Question

0

Asked: May 18, 20262026-05-18T08:32:51+00:00 2026-05-18T08:32:51+00:00

I’m performing matrix multiplication with this simple algorithm. To be more flexible I used

0

I’m performing matrix multiplication with this simple algorithm. To be more flexible I used objects for the matricies which contain dynamicly created arrays.

Comparing this solution to my first one with static arrays it is 4 times slower. What can I do to speed up the data access? I don’t want to change the algorithm.

 matrix mult_std(matrix a, matrix b) {
 matrix c(a.dim(), false, false);
 for (int i = 0; i < a.dim(); i++)
  for (int j = 0; j < a.dim(); j++) {
   int sum = 0;
   for (int k = 0; k < a.dim(); k++)
    sum += a(i,k) * b(k,j);
   c(i,j) = sum;
  }

 return c;
}

EDIT

I corrected my Question avove! I added the full source code below and tried some of your advices:

swapped k and j loop iterations -> performance improvement
declared dim() and operator()() as inline -> performance improvement
passing arguments by const reference -> performance loss! why? so I don’t use it.

The performance is now nearly the same as it was in the old porgram. Maybe there should be a bit more improvement.

But I have another problem: I get a memory error in the function mult_strassen(...). Why?
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

OLD PROGRAM
main.c http://pastebin.com/qPgDWGpW

c99 main.c -o matrix -O3

NEW PROGRAM
matrix.h http://pastebin.com/TYFYCTY7
matrix.cpp http://pastebin.com/wYADLJ8Y
main.cpp http://pastebin.com/48BSqGJr

g++ main.cpp matrix.cpp -o matrix -O3.

EDIT

Here are some results. Comparison between standard algorithm (std), swapped order of j and k loop (swap) and blocked algortihm with block size 13 (block).
alt text

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T08:32:52+00:00

Pass the parameters by const reference to start with:

matrix mult_std(matrix const& a, matrix const& b) {

To give you more details we need to know the details of the other methods used.
And to answer why the original method is 4 times faster we would need to see the original method.

The problem is undoubtedly yours as this problem has been solved a million times before.

Also when asking this type of question ALWAYS provide compilable source with appropriate inputs so we can actually build and run the code and see what is happening.

Without the code we are just guessing.

Edit

After fixing the main bug in the original C code (a buffer over-run)

I have update the code to run the test side by side in a fair comparison:

 // INCLUDES -------------------------------------------------------------------
 #include <stdlib.h>
 #include <stdio.h>
 #include <sys/time.h>
 #include <time.h>

 // DEFINES -------------------------------------------------------------------
 // The original problem was here. The MAXDIM was 500. But we were using arrays
 // that had a size of 512 in each dimension. This caused a buffer overrun that
 // the dim variable and caused it to be reset to 0. The result of this was causing
 // the multiplication loop to fall out before it had finished (as the loop was
 // controlled by this global variable.
 //
 // Everything now uses the MAXDIM variable directly.
 // This of course gives the C code an advantage as the compiler can optimize the
 // loop explicitly for the fixed size arrays and thus unroll loops more efficiently.
 #define MAXDIM 512
 #define RUNS 10

 // MATRIX FUNCTIONS ----------------------------------------------------------
 class matrix
 {
 public:
 matrix(int dim)
       : dim_(dim)
 {
         data_ = new int[dim_ * dim_];

 }

     inline int dim() const {
                         return dim_;
                 }
                 inline int& operator()(unsigned row, unsigned col) {
                         return data_[dim_*row + col];
                 }

                 inline int operator()(unsigned row, unsigned col) const {
                         return data_[dim_*row + col];
                 }

 private:
     int dim_;
     int* data_;
 };

// ---------------------------------------------------
 void random_matrix(int (&matrix)[MAXDIM][MAXDIM]) {
         for (int r = 0; r < MAXDIM; r++)
                 for (int c = 0; c < MAXDIM; c++)
                         matrix[r][c] = rand() % 100;
 }
 void random_matrix_class(matrix& matrix) {
         for (int r = 0; r < matrix.dim(); r++)
                 for (int c = 0; c < matrix.dim(); c++)
                         matrix(r, c) = rand() % 100;
 }

 template<typename T, typename M>
 float run(T f, M const& a, M const& b, M& c)
 {
         float time = 0;

         for (int i = 0; i < RUNS; i++) {
                 struct timeval start, end;
                 gettimeofday(&start, NULL);
                 f(a,b,c);
                 gettimeofday(&end, NULL);

                 long s = start.tv_sec * 1000 + start.tv_usec / 1000;
                 long e = end.tv_sec * 1000 + end.tv_usec / 1000;

                 time += e - s;
         }
         return time / RUNS;
 }
 // SEQ MULTIPLICATION ----------------------------------------------------------
  int* mult_seq(int const(&a)[MAXDIM][MAXDIM], int const(&b)[MAXDIM][MAXDIM], int (&z)[MAXDIM][MAXDIM]) {
          for (int r = 0; r < MAXDIM; r++) {
                  for (int c = 0; c < MAXDIM; c++) {
                          z[r][c] = 0;
                          for (int i = 0; i < MAXDIM; i++)
                                  z[r][c] += a[r][i] * b[i][c];
                  }
          }
  }
  void mult_std(matrix const& a, matrix const& b, matrix& z) {
          for (int r = 0; r < a.dim(); r++) {
                  for (int c = 0; c < a.dim(); c++) {
                          z(r,c) = 0;
                          for (int i = 0; i < a.dim(); i++)
                                  z(r,c) += a(r,i) * b(i,c);
                  }
          }
  }

  // MAIN ------------------------------------------------------------------------
  using namespace std;
  int main(int argc, char* argv[]) {
          srand(time(NULL));

          int matrix_a[MAXDIM][MAXDIM];
          int matrix_b[MAXDIM][MAXDIM];
          int matrix_c[MAXDIM][MAXDIM];
          random_matrix(matrix_a);
          random_matrix(matrix_b);
          printf("%d ", MAXDIM);
          printf("%f \n", run(mult_seq, matrix_a, matrix_b, matrix_c));

          matrix a(MAXDIM);
          matrix b(MAXDIM);
          matrix c(MAXDIM);
          random_matrix_class(a);
          random_matrix_class(b);
          printf("%d ", MAXDIM);
          printf("%f \n", run(mult_std, a, b, c));

          return 0;
  }

The results now:

$ g++ t1.cpp
$ ./a.exe
512 1270.900000
512 3308.800000

$ g++ -O3 t1.cpp
$ ./a.exe
512 284.900000
512 622.000000

From this we see the C code is about twice as fast as the C++ code when fully optimized. I can not see the reason in the code.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m performing matrix multiplication with this simple algorithm. To be more flexible I used

Leave an answerCancel reply

1 Answer

Edit

Leave an answer
Cancel reply