The code shown in Example 5.7 consists of a nest of two do loops that scan all the elements in the grid,
computing the average of each element as the code proceeds. With the parallel do directive as shown,
we execute the do j loop in parallel, thereby processing the columns of the grid concurrently across
multiple threads. However, this code contains a data dependence from one iteration of the do j loop to
another. While iteration j computes a(i,j), iteration j + 1 is using this same value of a(i,j) in computing the
average for the value at a(i,j + 1). As a result, this memory location may simultaneously be read in one
iteration while it is being modified in another iteration.
Given the loop-carried data dependence in this example, the parallel code contains a data race and may
behave differently from the original, serial version of the code. However, this code can provide acceptable
behavior in spite of this data race. As a consequence of this data race, it is possible that an iteration may
occasionally use an older value of a(i,j
−
1) or a(i,j + 1) from a preceding or succeeding column, rather
than the latest value. However, the algorithm is tolerant of such errors and will still converge to the
desired solution within an acceptable number of time-step iterations. This class of algorithms are called
relaxed algorithms, where the parallel version does not implement the strict constraints of the original
serial code, but instead carefully relaxes some of the constraints to exploit parallelism. As a result, this
code can also successfully run in parallel without any synchronization and benefit from additional
speedup.
A final word of caution about this example. We have assumed that at any time a(i,j) either has an old
value or a new value. This assumption is in turn based on the implicit assumption that a(i,j) is of primitive
type such as a floating-point number. However, imagine a scenario where instead the type of a(i,j) is no
longer primitive but instead is complex or structured. It is then possible that while a(i,j) is in the process of
being updated, it has neither an old nor a new value, but rather a value that is somewhere in between.
While one iteration updates the new value of a(i,j) through multiple write operations, another thread in a
different iteration may be reading a(i,j) and obtaining a mix of old and new partial values, violating our
assumption above. Tricks such as this, therefore, must be used carefully with an understanding of the
granularity of primitive write operations in the underlying machine.
5.2.3 Synchronization Mechanisms in OpenMP
Having understood the basics of data races and some ways of avoiding them, we now describe the
synchronization mechanisms provided in OpenMP. Synchronization mechanisms can be broadly
classified into two categories: mutual exclusion and event synchronization.
Mutual exclusion synchronization is typically used to acquire exclusive access to shared data structures.
When multiple threads need to access the same data structure, then mutual exclusion synchronization
can be used to guarantee that (1) only one thread has access to the data structure for the duration of the
synchronization construct, and (2) accesses by multiple threads are interleaved at the granularity of the
synchronization construct.
Event synchronization is used to signal the completion of some event from one thread to another. For
instance, although mutual exclusion provides exclusive access to a shared object, it does not provide any
control over the order in which threads access the shared object. Any desired ordering between threads
is implemented using event synchronization constructs.
5.3 Mutual Exclusion Synchronization
OpenMP provides three different constructs for mutual exclusion. Although the basic functionality
provided is similar, the constructs are designed hierarchically and range from restrictive but easy to use at
one extreme, to powerful but requiring more skill on the part of the programmer at the other extreme. We
present these constructs in order of increasing power and complexity.
5.3.1 The Critical Section Directive
The general form of the critical section in Fortran is
!$omp critical [(name)]
block