264 CHAPTER 6. HIGH PERFORMANCE COMPUTING
copy of the array. Thus, there is no need to send data via mpi_bcast() in lines
26-28; note the mpi_bcast() subroutines are commented out, and they would
only send the required data to the appropriate processors. The matrix-vector
product is done by computing a linear combination of the columns of the matrix.
The linear combination is partitioned to obtain the parallel computation. Here
these calculations are done on each processor by either the BLAS2 subroutine
in lines 30-34. Then mpi_reduce() in line 36 is used to send n real numbers
(a column vector) to processor 0, received by processor 0 and summed to the
product vector. The mflops (million floating point operations per second) are
computed in line 42 where the timings are in milliseconds and there are 1000
repetitions of the matrix-vector product.
MPI/Fortran Code matvecmpi.f
1. program matvec
2. implicit none
3. include ’mpif.h’
4. real,dimension(1:1024,1:4096):: a
5. real,dimension(1:1024)::prod,prodt
6. real,dimension(1:4096)::x
7. real:: t1,t2,mflops
8. real:: timef
9. integer:: my_rank,p,n,source,dest,tag,ierr,loc_m
10. integer:: i,status(mpi_status_size),bn,en,j,it,m
11. data n,dest,tag/1024,0,50/
12. m = 4*n
13. a = 1.0
14. prod = 0.0
15. x = 3.0
16. call mpi_init(ierr)
17. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)
18. call mpi_comm_size(mpi_comm_world,p,ierr)
19. loc_m = m/p
20. bn = 1+(my_rank)*loc_m
21. en = bn + loc_m - 1
22. if (my_rank.eq.0) then
23. t1 = timef()
24. end if
25. do it = 1,1000
26. ! call mpi_bcast(a(1,bn),n*(en-bn+1),mpi_real,0,
mpi_comm_world,ierr)
27. ! call mpi_bcast(prod(1),n,mpi_real,0,
mpi_comm_world,ierr)
28. ! call mpi_bcast(x(bn),(en-bn+1),mpi_real,0,
mpi_comm_world,ierr)
© 2004 by Chapman & Hall/CRC
sgemv() (see http://www.netlib.org /blas/sgem v.f ) in line 29, or by the ji-loops