Index ■ I-5
bubbles, pipeline, A-13, A-20, E-47.
See also pipeline stalls
buckets, in histograms, 382
buffered crossbars, E-62
buffered wormhole switching, E-51
buffers. See also translation lookaside
buffers
branch-prediction, 82–86, 83, 84,
85
branch-target, 122–125, 122, 124
buffered crossbars, E-62
central, E-57
development of, K-22
in disk storage, 360, 360
instruction, A-15
limited, H-38 to H-40
load, 94, 94, 95, 97, 101, 102–103
reorder, 106–114, 107, 110, 111,
113, G-31 to G-32
streaming, K-54
victim, 301, 330, C-14
write, 210, 289, 291, 300–301,
301, 309
bundles, G-34 to G-36, G-36, G-37
Burks, A. W., 287, I-62, K-3
buses
in barrier synchronization, H-15
to H-16, H-16
bottlenecks in, 216, 216
data misses and, H-26, H-26
development of, K-62 to K-63
fairness in, H-13
point-to-point links replacing,
390, 390
in scoreboarding, A-70, A-73
in shared-media networks, E-22,
E-22, E-40
single-bus systems, 217–218
in snooping coherence protocols,
211–212, 213, 214, 215
in Tomasulo's approach, 93, 95,
96, 98, 101
in write invalidates, 209–210, 212,
213
bypassing. See forwarding
byte addressing, 9, 299
byte order, B-7 to B-8, B-8
C
C description language extensions,
B-9, B-36 to B-37
C language, integer division and
remainder in, I-12
caches
2:1 cache rule of thumb, C-28
AMD Opteron data cache
example, C-12 to C-14, C-13,
C-15
block addressing, 299, 299, C-8 to
C-9, C-8
block placement in, C-7 to C-8,
C-7
block size, H-25 to H-26, H-25
data, C-9, C-13, C-15, F-46
defined, C-2
development of, K-53
in IBM Blue Gene/L, H-42
interprocessor communication
and, H-5
L1 (See L1 cache)
L2 (See L2 cache)
multibanked, 298–299, 299, 309
multilevel, 291
nonblocking, 296–298, 297, 309,
K-54
remote communication latency
and, 205
in RISC pipelines, A-7
separated, C-14
SMT challenges to, 176–177
states of, 212
in superscalar processors, 155
tags in, 210–211, 289, C-36
trace, 131, 132, 133, 296, 309
victim, 301, K-54
virtual, C-36 to C-38, C-37
virtual memory compared with,
C-40, C-41
writes in, C-9 to C-12, C-10, C-13
cache access, pipelined, 296
cache associativity. See associativity
cache banks, 298–299, 299
cache blocks. See blocks
cache coherence problem. See also
cache coherence protocols
cache coherence protocols and,
207–208
I/O, 325–326
overview of, 205–207, 206
snooping protocols, 208–209, 209
snooping protocol example,
211–215, 213, 214, 215
state diagrams and, 214, 215
write invalidate protocol
implementation, 209–211
cache coherence protocols. See also
cache coherence problem;
directory-based cache
coherence protocols
avoiding deadlock from limiting
buffering, H-38 to H-40
directory controller
implementation, H-40 to
H-41
in distributed shared-memory
multiprocessors, H-36 to
H-37
in distributed-memory
multiprocessors, 232–233,
232, 233
in large-scale multiprocessors,
H-34 to H-41
memory consistency in, 243–246
snooping protocols and, 208–218,
H-34, H-35
spin lock scheme and, 241–242,
242
synchronization and, 240–242,
242
uniform memory access and, 217
cache CPI equation, 168
cache hierarchy. See also memory
hierarchy
in AlphaServer 4100, 220
cache organization overview,
288–293, 292
multilevel inclusion, 248–249
cache hits, C-2
cache indexing
address translation during,
291–292, C-36 to C-38, C-37,
C-39
in AMD Opteron, 326, 329, C-12
equation for, 326, 329, C-21
index size and, C-38, C-46
cache misses
block replacement in, C-10, C-14
categories of, C-22 to C-24, C-23,
C-24
communication of, H-35 to H-36
defined, C-2
in in-order execution, C-2 to C-3
in invalidate protocols, 210–211
cache misses