Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8887005
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T21:41:57+00:00 2026-06-14T21:41:57+00:00

I wrote two Matrix Multiplications programs in C++: Regular MM (source) , and Strassen’s

  • 0

I wrote two Matrix Multiplications programs in C++: Regular MM (source), and Strassen’s MM (source), both of which operate on square matrices of sizes 2^k x 2^k(in other words, square matrices of even size).

Results are just terrible. For 1024 x 1024 matrix, Regular MM takes 46.381 sec, while Strassen’s MM takes 1484.303 sec (25 minutes !!!!).

I attempted to keep the code as simple as possible. Other Strassen’s MM examples found on the web are not that much different from my code. One issue with Strassen’s code is obvious – I don’t have cutoff point, that switches to regular MM.

What other issues my Strassen’s MM code has ???

Thanks !

Direct links to sources
http://pastebin.com/HqHtFpq9
http://pastebin.com/USRQ5tuy

Edit1.
Fist, a lot of great advices. Thank you for taking your time and sharing knowledge.

I implemented changes(kept all of my code), added cut-off point.
MM of 2048×2048 matrix, with cutoff 512 already gives good results.
Regular MM: 191.49s
Strassen’s MM: 112.179s
Significant improvement.
Results were obtained on prehistoric Lenovo X61 TabletPC with Intel Centrino processor, using Visual Studio 2012.
I will do more checks(to make sure I got the correct results), and will publish the results.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T21:41:59+00:00Added an answer on June 14, 2026 at 9:41 pm

    One issue with Strassen’s code is obvious – I don’t have cutoff point,
    that switches to regular MM.

    It’s fair to say that recursing down to 1 point is the bulk of (if not the entire) problem. Trying to guess at other performance bottlenecks without addressing this is almost moot due to the massive performance hit that it brings. (In other words, you’re comparing Apples to Oranges.)

    As discussed in the comments, cache alignment could have an effect, but not to this scale. Furthemore, cache alignment would likely hurt the regular algorithm more than the Strassen algorithm since the latter is cache-oblivious.

    void strassen(int **a, int **b, int **c, int tam) {
    
        // trivial case: when the matrix is 1 X 1:
        if (tam == 1) {
                c[0][0] = a[0][0] * b[0][0];
                return;
        }
    

    That’s far too small. While the Strassen algorithm has a smaller complexity, it has a much bigger Big-O constant. For one, you have function call overhead all the way down to 1 element.

    This is analogous to using merge or quick sort and recursing all the way down to one element. To be efficient you need to stop the recursion when the size gets small and fall back to the classic algorithm.

    In quick/merge sort, you’d fall back to a low-overhead O(n^2) insertion or selection sort. Here you would fall back to the normal O(n^3) matrix multiply.


    The threshold which you fall back the classic algorithm should be a tunable threshold that will likely vary depending on the hardware and the ability of the compiler to optimize the code.

    For something like Strassen multiplication where the advantage is only O(2.8074) over the classic O(n^3), don’t be surprised if this threshold turns out to be very high. (thousands of elements?)


    In some applications there can be many algorithms each with decreasing complexity but increasing Big-O. The result is that multiple algorithms become optimal at different sizes.

    Large integer multiplication is a notorious example of this:

    • Grade-school Multiplication: O(N^2) optimal for < ~100 digits*
    • Karatsuba Multiplication: O(N^1.585) faster than above at ~100 digits*
    • Toom-Cook 3-way: O(N^1.465) faster than Karatsuba at ~3000 digits*
    • Floating-point FFT: O(> N log(N)) faster than Karatsuba/Toom-3 at ~700 digits*
    • Schönhage–Strassen algorithm (SSA): O(N log(n) loglog(n)) faster than FFT at ~ a billion digits*
    • Fixed-width Number-Theoretic Transform: O(N log(n) faster than SSA at ~ a few billion digits?*

    *Note these example thresholds are approximate and can vary drastically – often by more than a factor of 10.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I wrote two small java programs: a TCP client which sends many lines of
I wrote two code both are same Using NSURLConnection and Using AFNetworking. My NSURLConnection
I wrote two methods in class Util: public static final <T> T[] copy1(T[] source)
I am trying to write a simple matrixMultiplication application that multiplies two square matrices
This is my matrix code. I am multiplying two matrices. One of the matrices
I wrote two matrix classes in Java just to compare the performance of their
I wrote two applictions which comunicate by socket. This is the code: Server: import
Possible Duplicate: Matrix Multiplication in python? I already wrote a program can multiply two
I wrote two small applications (a client and a server) to test UDP communication
I wrote two methods with a void type parameter: procedure Method1(const MyVar; size: cardinal);

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.