This is the code for a matrix multiplication program ex implicit none real ::

Question

0

Asked: May 31, 20262026-05-31T16:58:50+00:00 2026-05-31T16:58:50+00:00

This is the code for a matrix multiplication program ex implicit none real ::

0

This is the code for a matrix multiplication

 program ex
    implicit none
    real :: a(256,256),b(256,256),c(256,256),t1,t2
    integer i,j,k,sum
    sum=0

    do j = 1,256
      do i = 1,256
        a(i,j) = 1
        b(i,j) = 1
        c(i,j) = 0.0
      enddo
    enddo

    call cpu_time(t1)
    !$acc region do

    do i=1,256
      do j=1,256
        sum=0
        do k=1,256
          sum=sum+a(i,k)*b(k,j)
          c(i,j)=sum
        end do
      end do
    end do
    !$acc end region
    call cpu_time(t2)
    print*,"cpu time=",t2-t1
    print*,c
  end program ex

When I execute this the execution time is 75 msec when using the accelerator directives and the PGI compiler. But when I run same matrix multiplication with a “cuda fortran” implementation the execution time is only 5msec. So there is big difference even though I used the accelerator directives. So I doubt that my accelerator directives are working properly.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T16:58:51+00:00

I tried to accelerate your program using very similar accelerator directives OpenHMPP. Note that I switched one your line, that is probably errorneously in the innermost loop. Also note, that I had to advice the compiler of the reduction taking place. Also I renamed the reduction variable, because it shadowed the sum intrinsic function.

The performance is not good, because of the overheead with starting the GPU kernel and because of the memory transfers. You need orders of magnitude more work for it to be profitable to use GPU.

For example when I used matrices 2000 x 2000 then the CPU execution time was 41 seconds, but GPU execution time only 8 s.

 program ex
    implicit none
    real :: a(256,256),b(256,256),c(256,256),t1,t2
    integer i,j,k,sm

      sm=0
      do j = 1,256
          do i = 1,256
             a(i,j) = 1
             b(i,j) = 1
             c(i,j) = 0.0
          enddo
       enddo
       call cpu_time(t1)
     !$hmpp region, target = CUDA
      !$hmppcg gridify, reduce(+:sm)
      do i=1,256

          do j=1,256

               sm=0
               do k=1,256

                   sm=sm+a(i,k)*b(k,j)
               end do
               c(i,j)=sm
          end do
      end do
     !$hmpp endregion
      call cpu_time(t2)
      print*,"cpu time=",t2-t1
      print*,sum(c)
end program ex

edit: it would be probably not to use reduce(+:sm), but just private(sm)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is the code for a matrix multiplication program ex implicit none real ::

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply