nonremovable, 78-81
output, 72, 74-76
problem of, 66-67
removing, 65-82, 73-81, 82
resources, 66
simple loop with, 66
data races
acceptable, examples of, 144-146
without conflicts, 143
correctness problems, 66
defined, 66
eradicating, with privatization, 143
eradicating, with reductions, 144
illustrating, 142
preventing, 81
default clause, 57-58
defined, 48
forms, 58
parallel directive, 95
syntax, 58
directives, 15-16, 17-20
atomic, 130, 152-155
barrier, 22, 22-23, 157-159
benefits of, 16
in C/C++, 6
code example, 6
continued to next line, 18
critical, 23, 33-34, 45, 130
defined, 6
do, 112-114
end critical, 148
end parallel, 7, 37
end parallel do, 30, 42
end sections, 115
end single, 117
flush, 163-165
in Fortran, 6
function of, 6
master, 45, 130, 161-162
ordered, 45, 130, 159-160
parallel, 37, 45, 94-100
parallel do, 23-24, 28, 29-30, 41-45,
142
parallel for, 43, 177
sections, 45, 114-116
single, 45, 117-119
syntax, 17-18
threadprivate, 103-106
work-sharing, 111-119
distributed memory, 8-9
application development/debugging
environments, 11
complexity, 11
impact on code quantity/quality, 11
scalability and, 10-11
shared memory vs., 10
distributed shared memory (DSM)
systems, 8
do directive, 112-114
barrier directive and, 123
clauses, 113
defined, 112
for exploiting SPMD-style
parallelism, 114
parallel region construct with, 113-
114
syntax, 113
using, 112
do loops, 41
dividing iterations of, 112
iterations, 25
parallel region, 98
See also loops
domain decompositions loop-level
parallelism vs., 175
with non-sparse algorithms, 176
dynamic schedules, 87, 89
appeal of, 186-187
costs, 177
defined, 86
flexibility, 88
impact on performance, 187
load balancing with, 177, 178
locality and, 177-178
for scaling, 187
See also loop schedules
dynamic threads, 133-134, 172, 201-
204
default, 134
defined, 133
performance and, 201-204
space sharing and, 203
use of, 202
See also threads
E
End critical directive, 148
end parallel do directive, 30, 42
end sections directive, 115
end single directive, 117
environment variables
defined, 16
numerical values, 131
OMP_DYNAMIC, 134, 137
OMP_NESTED, 135, 137
OMP_NUM_THREADS, 7, 38, 131-
132, 137
OMP_SCHEDULE, 135, 137
summary, 137
event synchronization, 157-162
barrier, 22
constructs, 157-162
defined, 147
uses, 22
See also synchronization
exercises
loop-level parallelism, 90-93
parallel regions, 138-139
performance, 207-209
starting with OpenMP, 40
synchronization, 168-169
explicit synchronization, 32-35
F
False sharing, 167-168, 189-191
defined, 190
exhibition of, 190
NUMA multiprocessors and, 206
fine-grained parallelism, 36
firstprivate clause, 63-65
in C++, 64
defined, 48, 63
form and usage, 63
objects in C++, 65
parallel loop with, 64
uses, 64
variable initiation, 64
See also lastprivate clause; private
clause
fissioning, 79-80
defined, 79-80
loop parallelization using, 81
part loop parallelization with, 80
floating-point variables, 62
flow dependences
caused by reduction, removing, 76
defined, 72
loop-carried, 77, 81
parallel version with, 76, 77, 78
removing, with induction variable
elimination, 77
removing, with loop skewing, 78
removing, with reduction clause, 76
serial version with, 76, 77, 78
See also data dependences
flush directive, 163-165
default, 164
defined, 163
each thread execution of, 164
producer/consumer example using,
164-165
syntax, 163
use of, 164-165
for loops, 41, 44-45
canonical shape, 44
increment expressions, 45
index, 44
start and end values, 44
Fortran, 2, 6
atomic directive syntax, 152-153
barrier directive syntax, 158
critical directive syntax, 147
cycle construct, 45
default clause syntax, 58
default scoping rules in, 55-56
directives in, 6
do directive syntax, 113
flush directive syntax, 163
Fortran-95, 15
High Performance (HPF), 9, 13
master directive syntax, 161
parallel directive syntax, 94
parallel do directive syntax, 43
Pthreads support, 12
reduction operators for, 60
sample scoping clauses in, 50
section directive syntax, 114-115
sentinels, 18-19
single directive syntax, 117
threadprivate directive syntax, 105
threadprivate variables and, 106
See also C/C++
Fourier-Motzkin projection, 68
Fujitsu VPP5000, 2
G
Gang scheduling, 203-204
defined, 203
space sharing vs., 204
global synchronization, 150
global variables, 101
goto statement, 121
granularity, 173-175
concept, 174-175
defined, 172
See also performance
grids, 127
guided schedules benefits, 89
chunks, 89
defined, 87
See also loop schedules
guided self-scheduling (GSS)
schedules, 176, 179
H
Heap-allocated storage, 54
hierarchy of mechanisms, 168
High Performance Fortran (HPF), 9,
13
HP 9000 V-Class, 8
I-K
IBM SP-2, 9
if clause, 131
defined, 43
parallel directive, 96
parallel do directive, 83-84
uses, 83-84
inconsistent parallelization, 191-192
incremental parallelization, 4
induction variable elimination, 77
inductions, 77