Copyright © National Academy of Sciences. All rights reserved.
The Future of Computing Performance: Game Over or Next Level?
134 THE FUTURE OF COMPUTING PERFORMANCE
Internet applications) has been important in the steady increase in avail-
able parallelism for these sorts of applications.
A large and growing collection of applications lies between those
extremes. In these applications, there is parallelism to be exploited, but it
is not easy to extract: it is less regular and less structured in its spatial and
temporal control and its data-access and communication patterns. One
might argue that these have been the focus of the high-performance com-
puting (HPC) research community for many decades and thus are well
understood with respect to those aspects that are amenable to parallel
decomposition. The research community also knows that algorithms best
suited for a serial machine (for example, quicksort, simplex, and gaston)
differ from their counterparts that are best suited for parallel machines
BOX 5.1
React, But Don’t Overreact, to Parallelism
As this report makes clear, software and hardware researchers and practitio-
ners should address important concerns regarding parallelism. At such critical
junctures, enthusiasm seems to dictate that all talents and resources be applied
to the crisis at hand. Taking the longer view, however, a prudent research port-
folio must include concomitant efforts to advance all systems aspects, lest they
become tomorrow’s bottlenecks or crises.
For example, in the rush to innovate on chip multiprocessors (CMPs), it is
tempting to ignore sequential core performance and to deploy many simple
cores. That approach may prevail, but history and Amdahl’s law suggest caution.
Three decades ago, a hot technology was vectors. Pioneering vector machines,
such as the Control Data STAR-100 and Texas Instruments ASC, advanced vec-
tor technology without great concern for improving other aspects of compu-
tation. Seymour Cray, in contrast, designed the Cray-1
1
to have great vector
performance as well as to be the world’s fastest scalar computer. Ultimately, his
approach prevailed, and the early machines faded away.
Moreover, Amdahl’s law raises concern.
2
Amdahl’s limit argument assumed
that a fraction, P, of software execution is infinitely parallelizable without over-
head, whereas the remaining fraction, 1 - P, is totally sequential. From that as-
sumption, it follows that the speedup with N cores—execution time on N cores
divided by execution time on one core—is governed by 1/[(1 - P) + P/N]. Many
learn that equation, but it is still instructive to consider its harsh numerical
consequences. For N = 256 cores and a fraction parallel P = 99%, for example,
speedup is bounded by 72. Moreover, Gustafson
3
made good “weak scaling”
arguments for why some software will fare much better. Nevertheless, the com-
mittee is skeptical that most future software will avoid sequential bottlenecks.
Even such a very parallel approach as MapReduce
4
has near-sequential activity
as the reduce phase draws to a close.
For those reasons, it is prudent to continue work on faster sequential cores,