White R.E. Computational Mathematics: Models, Methods, and Analysis with MATLAB and MPI

Подождите немного. Документ загружается.

290 CHAPTER 7. MESSAGE PASSING INTERFACE

of two components, which is dictated by the count parameter in mpi_scatter()

in line 19. The two real numbers are stored in the ﬁrst two components in the

local arrays, a_loc. The components a_loc(2:7) are not deﬁned, and the print

commands in line 20 verify this.

MPI/Fortran 9x Code scatmpi.f

1. program scatmpi

2.! Illustrates mpi_scatter.

3. implicit none

4. include ’mpif.h’

5. real, dimension(0:7):: a_list,a_loc

6. integer:: my_rank,p,n,source,dest,tag,ierr,loc_n

7. integer:: i,status(mpi_status_size)

8. data n,dest,tag/1024,0,50/

9. call mpi_init(ierr)

10. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)

11. call mpi_comm_size(mpi_comm_world,p,ierr)

12. if (my_rank.eq.0) then

13. do i = 0,7

14. a_list(i) = i

15. end do

16. end if

17.! The array, a_list, is sent and received in groups of

18.! two to the other processors and stored in a_loc.

19. call mpi_scatter(a_list,2,mpi_real,a_loc,2,mpi_real,0,&

mpi_comm_world,status,ierr)

20. print*, ’my_rank =’,my_rank,’a_loc = ’, a_loc

21. call mpi_ﬁnalize(ierr)

22. end program scatmpi

my_rank = 0 a_loc = 0.0000000000E+00 1.000000000

0.2347455187E-40 0.1010193260E-38 -0.8896380928E+10

-0.2938472521E+30 0.3083417141E-40 0.1102030158E-38

my_rank = 1 a_loc = 2.000000000 3.000000000

.2347455187E-40 0.1010193260E-38 -0.8896380928E+10

-0.2947757071E+30 0.3083417141E-40 0.1102030158E-38

my_rank = 2 a_loc = 4.000000000 5.000000000

0.2347455187E-40 0.1010193260E-38 -0.8896380928E+10

-0.2949304496E+30 0.3083417141E-40 0.1102030158E-38

my_rank = 3 a_loc = 6.000000000 7.000000000

7.3. GATHER AND SCATTER 291

0.2347455187E-40 0.1010193260E-38 -0.8896380928E+10

-0.3097083589E+30 0.3083417141E-40 0.1102030158E-38

7.3.4 Illustrations of mpi_gather()

The second code gathmpi.f collects some of the data loc_n, loc_a, and loc_b,

which is computed in lines 15-17 for each processor. In particular, all the values

of loc_a are sent and stored in the array a_list on processor 0. This is done by

mpi_gather() on line 23 where count is equal to one and the root processor is

zero. This is veriﬁed by the print commands in lines 18-20 and 25-29.

MPI/Fortran 9x Code gathmpi.f

1. program gathmpi

2.! Illustrates mpi_gather.

3. implicit none

4. include ’mpif.h’

5. real:: a,b,h,loc_a,lo c_b,total

6. real, dimension(0:31):: a_list

7. integer:: my_rank,p,n,source,dest,tag,ierr,loc_n

8. integer:: i,status(mpi_status_size)

9. data a,b,n,dest,tag/0.0,100.0,1024,0,50/

10. call mpi_init(ierr)

11. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)

12. call mpi_comm_size(mpi_comm_world,p,ierr)

13. h = (b-a)/n

14.! Each pro cessor has a unique loc_n, loc_a and loc_b

15. loc_n = n/p

16. loc_a = a+my_rank*loc_n*h

17. loc_b = loc_a + loc_n*h

18. print*,’my_rank =’,my_rank, ’loc_a = ’,loc_a

19. print*,’my_rank =’,my_rank, ’loc_b = ’,loc_b

20. print*,’my_rank =’,my_rank, ’loc_n = ’,loc_n

21.! The loc_a are sent and recieved to an array, a_list, on

22.! processor 0.

23. call mpi_gather(loc_a,1,mpi_real,a_list,1,mpi_real,0,&

mpi_comm_world,status,ierr)

24. call mpi_barrier(mpi_comm_world,ierr)

25. if (my_rank.eq.0) then

26. do i = 0,p-1

27. print*, ’a_list(’,i,’) = ’,a_list(i)

28. nd do

29. end if

30. call mpi_ﬁnalize(ierr)

31. end program gathmpi

292 CHAPTER 7. MESSAGE PASSING INTERFACE

my_rank = 0 loc_a = 0.0000000000E+00

my_rank = 0 loc_b = 25.00000000

my_rank = 0 loc_n = 256

my_rank = 1 loc_a = 25.00000000

my_rank = 1 loc_b = 50.00000000

my_rank = 1 loc_n = 256

my_rank = 2 loc_a = 50.00000000

my_rank = 2 loc_b = 75.00000000

my_rank = 2 loc_n = 256

my_rank = 3 loc_a = 75.00000000

my_rank = 3 loc_b = 100.0000000

my_rank = 3 loc_n = 256

a_list( 0 ) = 0.0000000000E+00

a_list( 1 ) = 25.00000000

a_list( 2 ) = 50.00000000

a_list( 3 ) = 75.00000000

The third version of a parallel dot product in dot3mpi.f uses mpi_gather()

to collect the local dot products that have been computed concurrently in

lines 25-27. The local dot products, loc_dot, are sent and stored in the ar-

ray loc_dots(0:31) on processor 0. This is done by the call to mpi_gather()

on line 31 where the count parameter is equal to one and the root processor is

zero. Lines 33-36 sum the local dot products, and the print commands in lines

21-23 and 33-36 conﬁrm this.

MPI/Fortran 9x Code dot3mpi.f

1. program dot3mpi

2.! Illustrates dot product via mpi_gather.

3. implicit none

4. include ’mpif.h’

5. real:: loc_dot,dot

6. real, dimension(0:31):: a,b, loc_dots

7. integer:: my_rank,p,n,source,dest,tag,ierr,loc_n

8. integer:: i,status(mpi_status_size),en,bn

9. data n,dest,tag/8,0,50/

10. do i = 1,n

11. a(i) = i

12. b(i) = i+1

13. end do

14. call mpi_init(ierr)

15. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)

16. call mpi_comm_size(mpi_comm_world,p,ierr)

17.! Each processor computes a local dot product

18. loc_n = n/p

19. bn = 1+(my_rank)*loc_n

7.3. GATHER AND SCATTER 293

20. en = bn + loc_n-1

21. print*,’my_rank =’,my_rank, ’loc_n = ’,loc_n

22. print*,’my_rank =’,my_rank, ’bn = ’,bn

23. print*,’my_rank =’,my_rank, ’en = ’,en

24. loc_dot = 0.0

25. do i = bn,en

26. loc_dot = l oc_dot + a(i)*b(i)

27. end do

28. print*,’my_rank =’,my_rank, ’loc_dot = ’,loc_dot

29.! mpi_gather sends and recieves all local dot products

30.! to the array loc_dots in processor 0.

31. call mpi_gather(loc_dot,1,mpi_real,loc_dots,1,mpi_real,0,&

mpi_comm_world,status,ierr)

32.! Processor 0 sums the local dot products.

33. if (my_rank.eq.0) then

34. dot = loc_dot + sum(loc_dots(1:p-1))

35. print*, ’dot product = ’,dot

36. end if

37. call mpi_ﬁnalize(ierr)

38. end program dot3mpi

my_rank = 0 loc_n = 2

my_rank = 0 bn = 1

my_rank = 0 en = 2

my_rank = 1 loc_n = 2

my_rank = 1 bn = 3

my_rank = 1 en = 4

my_rank = 2 loc_n = 2

my_rank = 2 bn = 5

my_rank = 2 en = 6

my_rank = 3 loc_n = 2

my_rank = 3 bn = 7

my_rank = 3 en = 8

my_rank = 0 loc_dot = 8.000000000

my_rank = 1 loc_dot = 32.00000000

my_rank = 2 loc_dot = 72.00000000

my_rank = 3 loc_dot = 128.0000000

dot product = 240.0000000

Another application of mpi_gather() is in the matrix-matrix product code

mmmpi.f, which was presented in Section 6.5. Here the product

EF was formed

by computing in parallel

EF (eq : hq) > and these partial products were com-

municated via mpi_gather() to the root processor.

294 CHAPTER 7. MESSAGE PASSING INTERFACE

7.3.5 Exercises

1. Duplicate the calculations for scatmpi.f and experiment with di erent

numbers of processors.

2. Duplicate the calculations for gathmpi.f and experiment with di

erent

numbers of processors.

3. Duplicate the calculations for dot3mpi.f and experiment with di

erent

numbers of processors and di

erent size vectors.

4. Use mpi_gather() to compute in parallel a linear combination of the two

vectors,

{ + |=

5. Use mpi_gather() to modify trapmpi.f to execute Simpson’s rule in par-

allel.

7.4 Grouped Data Types

7.4.1 Introduction

There is s ome startup time associated with each MPI subroutine. So if a large

number of calls to mpi_send() and mpi_recv() are made, then the communi-

cation p ortion of the code may be signiﬁcant. By collecting data in groups a

single communication subroutine may be used for large amounts of data. Here

we will present three methods for the grouping of data: count, derived types

and packed.

7.4.2 Count Type

The count parameter has already been used in some of the p revious codes. The

parameter count refers to the number of mpi_datatypes to be communicated.

The most common data types are mpi_real or mpi_int, and these are usually

stored in arrays whose components are addressed sequentially. In Fortran the

two dimensional arrays comp onents are listed by columns starting with the

leftmost column. For example, if the array is b(1:2,1:3), then the list for b is

b(1,1), b(2,1), b(1,2), b(2,2), b(1,3) and b(2,3). Starting at b(1,1) with count

= 4 gives the ﬁrst four components, and starting at b(1,2) with count = 4 gives

the last four components.

The code countmpi.f illustrates the count parameter method when it is used

in the subroutine mpi_bcast(). Lines 14-24 initialize in processor 0 two arrays

a(1:4) and b(1:2,1:3). All of the array a is broadcast, in line 29, to the other

processors, and just the ﬁrst four components of the two dimensional array b

are broadcast, in line 30, to the other processors. This is conﬁrmed by the print

commands in lines 26, 32 and 33.

MPI/Fortran 9x Code countmpi.f

1. program countmpi

2.! Illustrates count for arrays.

7.4. GROUPED DATA TYPES 295

3. implicit none

4. include ’mpif.h’

5. real, dimension(1:4):: a

6. integer, dimension(1:2,1:3):: b

7. integer:: my_rank,p,n,source,dest,tag,ierr,loc_n

8. integer:: i,j,status(mpi_status_size)

9. data n,dest,tag/4,0,50/

10. call mpi_init(ierr)

11. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)

12. call mpi_comm_size(mpi_comm_world,p,ierr)

13.! Deﬁne the arrays.

14. if (my_rank.eq.0) then

15. a(1) = 1.

16. a(2) = exp(1.)

17. a(3) = 4*atan(1.)

18. a(4) = 186000.

19. do j = 1,3

20. do i = 1,2

21. b(i,j) = i+j

22. end do

23. end do

24. end if

25.! Each processor attempts to print the array.

26. print*,’my_rank =’,my_rank, ’a = ’,a

27. call mpi_barrier(mpi_comm_world,ierr)

28.! The arrays are broadcast via count equal to four.

29. call mpi_bcast(a,4,mpi_real,0,&

mpi_comm_world,ierr)

30. call mpi_bcast(b,4,mpi_int,0,&

mpi_comm_world,ierr)

31.! Each processor prints the arrays.

32. print*,’my_rank =’,my_rank, ’a = ’,a

33. print*,’my_rank =’,my_rank, ’b = ’,b

34. call mpi_ﬁnalize(ierr)

35. end program countmpi

my_rank = 0 a = 1.000000000 2.718281746

3.141592741 186000.0000

my_rank = 1 a = -0.1527172301E+11 -0.1775718601E+30

0.8887595380E-40 0.7346867719E-39

my_rank = 2 a = -0.1527172301E+11 -0.1775718601E+30

0.8887595380E-40 0.7346867719E-39

my_rank = 3 a = -0.1527172301E+11 -0.1775718601E+30

0.8887595380E-40 0.7346867719E-39

296 CHAPTER 7. MESSAGE PASSING INTERFACE

my_rank = 0 a = 1.000000000 2.718281746

3.141592741 186000.0000

my_rank = 0 b = 2 3 3 4 4 5

my_rank = 1 a = 1.000000000 2.718281746

3.141592741 186000.0000

my_rank = 1 b = 2 3 3 4 -803901184 -266622208

my_rank = 2 a = 1.000000000 2.718281746

3.141592741 186000.0000

my_rank = 2 b = 2 3 3 4 -804478720 -266622208

my_rank = 3 a = 1.000000000 2.718281746

3.141592741 186000.0000

my_rank = 3 b = 2 3 3 4 -803901184 -266622208

7.4.3 Derived Type

If the data to be communicated is either of mixed type or is not adjacent in the

memory, then one can create a user deﬁned mpi_type. For example, the data

to be grouped may have some mpi_real, mpi_int and mpi_char entries and

be in nonadjacent locations in memory. The derived type must have four items

for each entry: blocks or count of each mpi_type, type list, address in memory

and displacement. The address in memory can be gotten by a MPI subroutine

called mpi_address(a,addresses(1),ierr) where a is one of the entries in the new

data type.

The following code dertypempi.f creates a new data type, which is called

data_mpi_type. It consists of four entries with one mpi_real, a, one mpi_real,

b, one mpi_int, c and one mpi_int, d. These entries are initialized on processor

0 by lines 19-24. In order to communicate them as a single new data type via

mpi_bcast(), the new data type is created in lines 26-43. The four arrays

blocks, typ elist, addresses and displacements are initialized. The call in line 42

to mpi_type_struct(4, blocks, displacements, typelist, data_mpi_type ,ierr)

enters this structure and identiﬁes it with the name data_mpi_type. Finally

the call in line 43 to mpi_type_commit(data_mpi_type,ierr) ﬁnalizes this user

deﬁned data type. The call to mpi_bcast() in line 52 addresses the ﬁrst entry

of the data_mpi_type and uses count =1 so that the data a, b, c and d will

be broadcast to the other processors. This is veriﬁed by the print commands

in lines 46-49 and 54-57.

MPI/Fortran 9x Code dertypempi.f

1. program dertypempi

2.! Illustrates a derived type.

3. implicit none

4. include ’mpif.h’

5. real:: a,b

6. i nteger::c,d

7. integer::data_mpi_type

7.4. GROUPED DATA TYPES 297

8. integer::ierr

9. integer, dimension(1:4)::blocks

10. integer, dimension(1:4)::displacements

11. integer, dimension(1:4)::addresses

12. integer, dimension(1:4)::typelist

13. integer:: my_rank,p,n,source,dest,tag,loc_n

14. integer:: i,status(mpi_status_size)

15. data n,dest,tag/4,0,50/

16. call mpi_init(ierr)

17. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)

18. call mpi_comm_size(mpi_comm_world,p,ierr)

19. if (my_rank.eq.0) then

20. a = exp(1.)

21. b = 4*atan(1.)

22. c = 1

23. d = 186000

24. end if

25.! Deﬁne the new derived type, data_mpi_type.

26. typelist(1) = mpi_real

27. typelist(2) = mpi_real

28. typelist(3) = mpi_integer

29. typelist(4) = mpi_integer

30. blocks(1) = 1

31. blocks(2) = 1

32. blocks(3) = 1

33. blocks(4) = 1

34. call mpi_address(a,addresses(1),ierr)

35. call mpi_address(b,addresses(2),ierr)

36. call mpi_address(c,addresses(3),ierr)

37. call mpi_address(d,addresses(4),ierr)

38. displacements(1) = addresses(1) - addresses(1)

39. displacements(2) = addresses(2) - addresses(1)

40. displacements(3) = addresses(3) - addresses(1)

41. displacements(4) = addresses(4) - addresses(1)

42. call mpi_type_struct(4,blocks,displacements,&

. typelist,data_mpi_type,ierr)

43. call mpi_type_commit(data_mpi_type,ierr)

44.! Before the broadcast of the new typ e data_mpi_type

45.! try to print the data.

46. print*,’my_rank =’,my_rank, ’a = ’,a

47. print*,’my_rank =’,my_rank, ’b = ’,b

48. print*,’my_rank =’,my_rank, ’c = ’,c

49. print*,’my_rank =’,my_rank, ’d = ’,d

50. call mpi_barrier(mpi_comm_world,ierr)

51.! Broadcast data_mpi_type.

298 CHAPTER 7. MESSAGE PASSING INTERFACE

52. call mpi_bcast(a,1,data_mpi_type,0,&

mpi_comm_world,ierr)

53.! Each processor prints the data.

54. print*,’my_rank =’,my_rank, ’a = ’,a

55. print*,’my_rank =’,my_rank, ’b = ’,b

56. print*,’my_rank =’,my_rank, ’c = ’,c

57. print*,’my_rank =’,my_rank, ’d = ’,d

58. call mpi_ﬁnalize(ierr)

59. end program dertypempi

my_rank = 0 a = 2.718281746

my_rank = 0 b = 3.141592741

my_rank = 0 c = 1

my_rank = 0 d = 186000

my_rank = 1 a = 0.2524354897E-28

my_rank = 1 b = 0.1084320046E-18

my_rank = 1 c = 20108

my_rank = 1 d = 3

my_rank = 2 a = 0.2524354897E-28

my_rank = 2 b = 0.1084320046E-18

my_rank = 2 c = 20108

my_rank = 2 d = 3

my_rank = 3 a = 0.2524354897E-28

my_rank = 3 b = 0.1084320046E-18

my_rank = 3 c = 20108

my_rank = 3 d = 3

my_rank = 0 a = 2.718281746

my_rank = 0 b = 3.141592741

my_rank = 0 c = 1

my_rank = 0 d = 186000

my_rank = 1 a = 2.718281746

my_rank = 1 b = 3.141592741

my_rank = 1 c = 1

my_rank = 1 d = 186000

my_rank = 2 a = 2.718281746

my_rank = 2 b = 3.141592741

my_rank = 2 c = 1

my_rank = 2 d = 186000

my_rank = 3 a = 2.718281746

my_rank = 3 b = 3.141592741

my_rank = 3 c = 1

my_rank = 3 d = 186000

7.4. GROUPED DATA TYPES 299

7.4.4 Packed Type

The subroutine mpi_pack() relocates data to a new array, which is addressed

sequentially. Communication subroutines such as mpi_bcast() can be used with

the count parameter to send the data to other processors. The data is then

unpacked from the array created by mpi_unpack()

mpi_pack(locdata, count, mpi_datatype,

packarray, position, mpi_comm, ierr)

locdata array(*)

count integer

mpi_datatype integer

packarray array(*)

packcount int eger

position integer

mpi_comm integer

ierr integer

mpi_unpack(destarray, count, mpi_datatype,

locdata, position, mpi_comm, ierr)

packarray array(*)

packcount int eger

mpi_datatype integer

locdata array(*)

count integer

position integer

mpi_comm integer

ierr integer

In packmpi.f four variables on processor 0 are initialized in l ines 17-18 and

packed into the array numbers in lines 21-25. Then in lines 26 and 28 the

array number is broadcast to the other processors. In lines 30-34 this data is

unpacked to the original local variables, which are duplicated on each of the

other pro cessors. The print commands in lines 37-40 verify this.

MPI/Fortran 9x Code packmpi.f

1. program packmpi

2.! Illustrates mpi_pack and mpi_unpack.

3. implicit none

4. include ’mpif.h’

5. real:: a,b

6. integer::c,d,location

7. integer::ierr

8. character, dimension(1:100)::numbers

9. integer:: my_rank,p,n,source,dest,tag,loc_n

10. integer:: i,status(mpi_status_size)

11. data n,dest,tag/4,0,50/