Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7901419
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T09:12:49+00:00 2026-06-03T09:12:49+00:00

I have a school project to do matrix multiplication on a hpc distributed system.

  • 0

I have a school project to do matrix multiplication on a hpc distributed system.

I need to read in a matrix from a parallel IO system and use pblacs to perform the matrix multiplication in parallel on many compute nodes(processors). The data must be read in using MPI IO commands. I know PBlacs uses block cyclic distributions to perform the multiplication.

The professor has not given us much info on MPI IO, and I am having trouble finding much information/resources on it. Specifically, are there ways to read in a matrix from a parallel io system in a block cyclic manner and easily plug that into pblacs pdgemm?

Any pointers to useful resources would be much appreciated. I am a bit short on time, and getting frustrated with the lack of direction on this project.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T09:12:51+00:00Added an answer on June 3, 2026 at 9:12 am

    This is actually relatively straightforward to do (if you already know something about blacs/scalapack and mpi-io!) but even then the documentation – even online – is as you’ve discovered, somewhat poor.

    The first thing to know about MPI-IO is that it lets you use normal MPI data types to specify each process’ “view” of the file, and then read only the data that falls into that view. At our centre we have slides and source code for a half-day course on parallel IO; the first third or so is about the basics of MPI-IO. There are slides here and sample source code here.

    The second thing to know is that MPI has a built-in way to create “distributed array” data types, one combination of which lets you lay out a block-cyclic data distribution; that’s discussed in general terms in my answer to this question: What is the difference between darray and subarray in mpi? .

    So that means if you have a binary file containing a big matrix, you can read it in with mpi-io using MPI_Type_create_darray and it’ll be distributed by tasks in a block-cyclic way. Then it’s just a matter of doing the blacs or scalapack call. An example program of using the scalapack psgemv for matrix-vector multiplication rather than psgemm is listed
    in my answer to a question on the Computational Science stack exchange.

    To give you an idea of how the pieces fit together, the following is a simple program which reads in a binary file containing a matrix (first the size of the square matrix N and then the N^2 elements) and then calculates the eigenvalues and vectors using scalapack’s (new) pssyevr routine. It combines the MPI-IO, darray, and scalapack stuff. It’s in Fortran, but the function calls are the same in C-based languages.

    !
    ! Use MPI-IO to read a diagonal matrix distributed block-cyclically,
    ! use Scalapack to calculate its eigenvalues, and compare
    ! to expected results.
    !
    program darray
          use mpi
          implicit none
    
          integer :: n, nb    ! problem size and block size
          integer :: myArows, myAcols   ! size of local subset of global array
          real :: p
          real, dimension(:), allocatable :: myA, myZ
          real, dimension(:), allocatable :: work
          integer, dimension(:), allocatable :: iwork
          real, dimension(:), allocatable :: eigenvals
          real, dimension(:), allocatable :: expected
          integer :: worksize, totwork, iworksize
    
          integer, external :: numroc   ! blacs routine
          integer :: me, procs, icontxt, prow, pcol, myrow, mycol  ! blacs data
          integer :: info    ! scalapack return value
          integer, dimension(9)   :: ides_a, ides_z ! scalapack array desc
          integer :: clock
          real :: calctime, iotime
    
          character(len=128) :: filename
          integer :: mpirank
          integer :: ierr
          integer, dimension(2) :: pdims, dims, distribs, dargs
          integer :: infile
          integer, dimension(MPI_STATUS_SIZE) :: mpistatus
          integer :: darray
          integer :: locsize, nelements
          integer(kind=MPI_ADDRESS_KIND) :: lb, locextent
          integer(kind=MPI_OFFSET_KIND) :: disp
          integer :: nargs
          integer :: m, nz
    
    ! Initialize MPI (for MPI-IO)
    
          call MPI_Init(ierr)
          call MPI_Comm_size(MPI_COMM_WORLD,procs,ierr)
          call MPI_Comm_rank(MPI_COMM_WORLD,mpirank,ierr)
    
    ! May as well get the process grid from MPI_Dims_create
          pdims = 0
          call MPI_Dims_create(procs, 2, pdims, ierr)
          prow = pdims(1)
          pcol = pdims(2)
    
    ! get filename
          nargs = command_argument_count()
          if (nargs /= 1) then
              print *,'Usage: darray filename'
              print *,'       Where filename = name of binary matrix file.'
              call MPI_Abort(MPI_COMM_WORLD,1,ierr)
          endif
          call get_command_argument(1, filename)
    
    ! find the size of the array we'll be using
    
          call tick(clock)
          call MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, infile, ierr)
          call MPI_File_read_all(infile,n,1,MPI_INTEGER,mpistatus,ierr)
          call MPI_File_close(infile,ierr)
    
    ! create the darray that will read in the data.
    
          nb = 64
          if (nb > (N/prow)) nb = N/prow
          if (nb > (N/pcol)) nb = N/pcol
          dims = [n,n]
          distribs = [MPI_DISTRIBUTE_CYCLIC, MPI_DISTRIBUTE_CYCLIC]
          dargs = [nb, nb]
    
          call MPI_Type_create_darray(procs, mpirank, 2, dims, distribs, dargs, &
                                      pdims, MPI_ORDER_FORTRAN, MPI_REAL, darray, ierr)
          call MPI_Type_commit(darray,ierr)
    
          call MPI_Type_size(darray, locsize, ierr)
          nelements = locsize/4
          call MPI_Type_get_extent(darray, lb, locextent, ierr)
    
    ! Initialize local arrays    
    
          allocate(myA(nelements))
          allocate(myZ(nelements))
          allocate(eigenvals(n), expected(n))
    
    ! read in the data
          call MPI_File_open(MPI_COMM_WORLD, trim(filename), MPI_MODE_RDONLY, MPI_INFO_NULL, infile, ierr)
          disp = 4   ! skip N = 4 bytes
          call MPI_File_set_view(infile, disp, MPI_REAL, darray, "native", MPI_INFO_NULL, ierr)
          call MPI_File_read_all(infile, myA, nelements, MPI_REAL, mpistatus, ierr)
          call MPI_File_close(infile,ierr)
    
          iotime = tock(clock)
          if (mpirank == 0) then
              print *,'I/O time      = ', iotime
          endif
    
    ! Initialize blacs processor grid
    
          call tick(clock)
          call blacs_pinfo   (me,procs)
    
          call blacs_get     (-1, 0, icontxt)
          call blacs_gridinit(icontxt, 'R', prow, pcol)
          call blacs_gridinfo(icontxt, prow, pcol, myrow, mycol)
    
          myArows = numroc(n, nb, myrow, 0, prow)
          myAcols = numroc(n, nb, mycol, 0, pcol)
    
    ! Construct local arrays
    ! Global structure:  matrix A of n rows and n columns
    
    ! Prepare array descriptors for ScaLAPACK 
          call descinit( ides_a, n, n, nb, nb, 0, 0, icontxt, myArows, info )
          call descinit( ides_z, n, n, nb, nb, 0, 0, icontxt, myArows, info )
    
    ! Call ScaLAPACK library routine
    
          allocate(work(1), iwork(1))
          iwork(1) = -1
          work(1)  = -1.
          call pssyevr( 'V', 'A', 'U', n, myA, 1, 1, ides_a, -1.e20, +1.e20, 1, n, &
                         m,  nz, eigenvals, myZ, 1, 1, ides_z, work, -1,           &
                         iwork, -1, info )
          worksize  = int(work(1))/2*3
          iworksize = iwork(1)/2*3
          print *, 'Local workspace ', worksize
          call MPI_Reduce(worksize, totwork, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
          if (mpirank == 0) print *, ' total work space ', totwork
          call MPI_Reduce(iworksize, totwork, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
          if (mpirank == 0) print *, ' total iwork space ', totwork
          deallocate(work,iwork)
          allocate(work(worksize),iwork(iworksize))
          call pssyev('N','U',n,myA,1,1,ides_a,eigenvals,myZ,1,1,ides_z,work,worksize,info)
          if (info /= 0) then
             print *, 'Error: info = ', info
          else if (mpirank == 0) then
             print *, 'Calculated ', m, ' eigenvalues and ', nz, ' eigenvectors.'
          endif
    
    ! Deallocate the local arrays
    
          deallocate(myA, myZ)
          deallocate(work, iwork)
    
    ! End blacs for processors that are used
    
          call blacs_gridexit(icontxt)
          calctime = tock(clock)
    
    ! calculated the expected eigenvalues for a particular test matrix
    
          p = 3. + sqrt((4. * n - 3.) * (n - 1.)*3./(n+1.))
          expected(1) = p/(n*(5.-2.*n))
          expected(2) = 6./(p*(n + 1.))
          expected(3:n) = 1.
    
    ! Print results
    
          if (me == 0) then
            if (info /= 0) then
                 print *, 'Error -- info = ', info
            endif
            print *,'Eigenvalues L_infty err = ', &
              maxval(abs(eigenvals-expected))
            print *,'Compute time = ', calctime
          endif
    
          deallocate(eigenvals, expected)
    
          call MPI_Finalize(ierr)
    
    
    contains
        subroutine tick(t)
            integer, intent(OUT) :: t
    
            call system_clock(t)
        end subroutine tick
    
        ! returns time in seconds from now to time described by t
        real function tock(t)
            integer, intent(in) :: t
            integer :: now, clock_rate
    
            call system_clock(now,clock_rate)
    
            tock = real(now - t)/real(clock_rate)
        end function tock
    
    end program darray
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For a school project I have to create a distributed system from a standalone
I have a project for school and I have to use Java. Recently I
I'm currently working on a school (matrix multiplier) project and I have a problem.
I have an interest in creating a NoSQL database system for a school project
I have this school project I'm making, where I need to make my code
I need to develop a Iphone 2D app for school project. I don't have
Working on a school project, the program is supposed to read from a text
I have a website for school project, written in html,jsp and I use mysql
I am trying to use GLFW for a school project, and have followed these
I have a school project in which I have to implement a chat application,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.