3.5.1 Why Data Dependences Are a Problem
Whenever one statement in a program reads or writes a memory location, and another statement reads
or writes the same location, and at least one of the two statements writes the location, we say that there
is a data dependence on that memory location between the two statements. In this context, a memory
location is anything to which the program can assign a scalar value, such as an integer, character, or
floating-point value. Each scalar variable, each element of an array, and each field of a structure
constitutes a distinct memory location. Example 3.11 shows a loop that contains a data dependence:
each iteration (except the last) writes an element of a that is read by the next iteration. Of course, a single
statement can contain multiple memory references (reads and writes) to the same location, but it is
usually the case that the references involved in a dependence occur in different statements, so we will
assume this from now on. In addition, there can be dependences on data external to the program, such
as data in files that is accessed using I/O statements, so if you wish to parallelize code that accesses
such data, you must analyze the code for dependences on this external data as well as on data in
variables.
Example 3.11: A simple loop with a data dependence.
do i = 2, N
a(i) = a(i) + a(i - 1)
enddo
For purposes of parallelization, data dependences are important because whenever there is a
dependence between two statements on some location, we cannot execute the statements in parallel. If
we did execute them in parallel, it would cause what is called a data race. A parallel program contains a
data race whenever it is possible for two or more statements to read or write the same memory location at
the same time, and at least one of the statements writes the location.
In general, data races cause correctness problems because when we execute a parallel program that
contains a data race, it may not produce the same results as an equivalent serial program. To see why,
consider what might happen if we try to parallelize the loop in Example 3.11 by naively applying a parallel
do directive. Suppose n is 3, so the loop iterates just twice, and at the start of the loop, the first three
elements of a have been initialized to the values 1, 2, and 3. After a correct serial execution, the first three
values are 1, 3, and 6. However, in a parallel execution it is possible for the assignment of the value 3 to
a(2) in the first iteration to happen either before or after the read of a(2) in the second iteration (the two
statements are "racing" each other). If the assignment happens after the read, a(3) receives an incorrect
value of 5.
3.5.2 The First Step: Detection
Now that we have seen why data dependences are a problem, the first step in dealing with them is to
detect any that are present in the loop we wish to parallelize. Since each iteration executes in parallel, but
within a single iteration statements in the loop body are performed in sequence, the case that concerns
us is a dependence between statements executed in different iterations of the loop. Such a dependence
is called loop-carried.
Because dependences are always associated with a particular memory location, we can detect them by
analyzing how each variable is used within the loop, as follows:
Is the variable only read and never assigned within the loop body? If so, there are no dependences
involving it.
Otherwise, consider the memory locations that make up the variable and that are assigned within the
loop. For each such location, is there exactly one iteration that accesses the location? If so, there are
no dependences involving the variable. If not, there is a dependence.
To perform this analysis, we need to reason about each memory location accessed in each iteration of
the loop. Reasoning about scalar variables is usually straightforward, since they uniquely identify the
memory location being referenced. Reasoning about array variables, on the other hand, can be tricky
because different iterations may access different locations due to the use of array subscript expressions
that vary from one iteration to the next. The key is to recognize that we need to find two different values of
the parallel loop index variable (call them i and í ) that both lie within the bounds of the loop, such that
iteration i assigns to some element of an array a, and iteration í reads or writes that same element of a. If
we can find suitable values for i and í, there is a dependence involving the array. If we can satisfy