262 ■ Chapter Four Multiprocessors and Thread-Level Parallelism
operating system originally protected the page table data structure with a single
lock, assuming that page allocation is infrequent. In a uniprocessor, this does
not represent a performance problem. In a multiprocessor, it can become a
major performance bottleneck for some programs. Consider a program that
uses a large number of pages that are initialized at start-up, which UNIX does
for statically allocated pages. Suppose the program is parallelized so that multi-
ple processes allocate the pages. Because page allocation requires the use of the
page table data structure, which is locked whenever it is in use, even an OS ker-
nel that allows multiple threads in the OS will be serialized if the processes all
try to allocate their pages at once (which is exactly what we might expect at ini-
tialization time!).
This page table serialization eliminates parallelism in initialization and has
significant impact on overall parallel performance. This performance bottle-
neck persists even under multiprogramming. For example, suppose we split the
parallel program apart into separate processes and run them, one process per
processor, so that there is no sharing between the processes. (This is exactly
what one user did, since he reasonably believed that the performance problem
was due to unintended sharing or interference in his application.) Unfortu-
nately, the lock still serializes all the processes—so even the multiprogramming
performance is poor. This pitfall indicates the kind of subtle but significant per-
formance bugs that can arise when software runs on multiprocessors. Like
many other key software components, the OS algorithms and data structures
must be rethought in a multiprocessor context. Placing locks on smaller por-
tions of the page table effectively eliminates the problem. Similar problems
exist in memory structures, which increases the coherence traffic in cases
where no sharing is actually occurring.
For more than 30 years, researchers and designers have predicted the end of uni-
processors and their dominance by multiprocessors. During this time period the
rise of microprocessors and their rapid performance growth has largely limited
the role of multiprocessing to limited market segments. In 2006, we are clearly at
an inflection point where multiprocessors and thread-level parallelism will play a
greater role across the entire computing spectrum. This change is driven by sev-
eral phenomena:
1. The use of parallel processing in some domains is much better understood.
First among these is the domain of scientific and engineering computation.
This application domain has an almost limitless thirst for more computation.
It also has many applications that have lots of natural parallelism. Nonethe-
less, it has not been easy: Programming parallel processors even for these
applications remains very challenging, as we discuss further in Appendix H.
2. The growth in server applications for transaction processing and Web ser-
vices, as well as multiprogrammed environments, has been enormous, and
4.10 Concluding Remarks