How can I improve the efficiency of standard matrix multiplication algorithm?
The main operation involved in this approach is: C[i][j]+=A[i][p]*B[p][j]
What can be done to improve the efficiency of the algorithm?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You might want to have a look at using a BLAS (Basic Linear Algebra Subroutine) library, specifically Intel offer their MKL here, AMD have their ACML here and there’s also the (opensource) Goto BLAS here.
The (dense) matrix-matrix multiply kernel will be a
?GEMMcall, where the?indicates the floating point type. For exampleDGEMMwill call thedoubleroutine.Unless you’re extremely confident you know what you’re doing with low-level optimisations, these libraries will probably offer better performance than something you can code by hand.
If you do want to have a go at coding this yourself then you may want to consider the following:
SSE, SSE2..4instructions are widely supported, some newerCPU‘s will also supportAVXinstructions.This reference might give you an idea of the current state of things:
Hope this helps.