4.9 Fallacies and Pitfalls ■ 259
rest was fully parallel—hence, linear speedup with 1000 processors. Because the
running time grew faster than linear, the program actually ran longer after scal-
ing, even with 1000 processors.
Speedup that assumes scaling of the input is not the same as true speedup and
reporting it as if it were misleading. Since parallel benchmarks are often run on
different-sized multiprocessors, it is important to specify what type of application
scaling is permissible and how that scaling should be done. Although simply
scaling the data size with processor count is rarely appropriate, assuming a fixed
problem size for a much larger processor count is often inappropriate as well,
since it is likely that users given a much larger multiprocessor would opt to run a
larger or more detailed version of an application. In Appendix H, we discuss dif-
ferent methods for scaling applications for large-scale multiprocessors, introduc-
ing a model called time-constrained scaling, which scales the application data
size so that execution time remains constant across a range of processor counts.
Fallacy Linear speedups are needed to make multiprocessors cost-effective.
It is widely recognized that one of the major benefits of parallel computing is to
offer a “shorter time to solution” than the fastest uniprocessor. Many people,
however, also hold the view that parallel processors cannot be as cost-effective as
uniprocessors unless they can achieve perfect linear speedup. This argument says
that because the cost of the multiprocessor is a linear function of the number
of processors, anything less than linear speedup means that the ratio of
performance/cost decreases, making a parallel processor less cost-effective than
using a uniprocessor.
The problem with this argument is that cost is not only a function of proces-
sor count, but also depends on memory, I/O, and the overhead of the system (box,
power supply, interconnect, etc.).
The effect of including memory in the system cost was pointed out by Wood
and Hill [1995]. We use an example based on more recent data using TPC-C and
SPECRate benchmarks, but the argument could also be made with a parallel sci-
entific application workload, which would likely make the case even stronger.
Figure 4.35 shows the speedup for TPC-C, SPECintRate and SPECfpRate on
an IBM eserver p5 multiprocessor configured with 4 to 64 processors. The figure
shows that only TPC-C achieves better than linear speedup. For SPECintRate and
SPECfpRate, speedup is less than linear, but so is the cost, since unlike TPC-C
the amount of main memory and disk required both scale less than linearly.
As Figure 4.36 shows, larger processor counts can actually be more cost-
effective than the four-processor configuration. In the future, as the cost of multi-
ple processors decreases compared to the cost of the support infrastructure (cabi-
nets, power supplies, fans, etc.), the performance/cost ratio of larger processor
configurations will improve further.
In comparing the cost-performance of two computers, we must be sure to
include accurate assessments of both total system cost and what performance is
achievable. For many applications with larger memory demands, such a compari-
son can dramatically increase the attractiveness of using a multiprocessor.