310 ■ Chapter Five Memory Hierarchy Design
. . . the one single development that put computers on their feet was the invention
of a reliable form of memory, namely, the core memory. . . . Its cost was reasonable,
it was reliable and, because it was reliable, it could in due course be made large.
[p. 209]
Maurice Wilkes
Memoirs of a Computer Pioneer (1985)
Main memory is the next level down in the hierarchy. Main memory satisfies the
demands of caches and serves as the I/O interface, as it is the destination of input
as well as the source for output. Performance measures of main memory empha-
size both latency and bandwidth. Traditionally, main memory latency (which
affects the cache miss penalty) is the primary concern of the cache, while main
memory bandwidth is the primary concern of multiprocessors and I/O. Chapter 4
discusses the relationship of main memory and multiprocessors, and Chapter 6
discusses the relationship of main memory and I/O.
Although caches benefit from low-latency memory, it is generally easier to
improve memory bandwidth with new organizations than it is to reduce latency.
The popularity of second-level caches, and their larger block sizes, makes main
memory bandwidth important to caches as well. In fact, cache designers increase
block size to take advantage of the high memory bandwidth.
The previous sections describe what can be done with cache organization to
reduce this processor-DRAM performance gap, but simply making caches larger
or adding more levels of caches cannot eliminate the gap. Innovations in main
memory are needed as well.
In the past, the innovation was how to organize the many DRAM chips that
made up the main memory, such as multiple memory banks. Higher bandwidth is
available using memory banks, by making memory and its bus wider, or doing
both.
Ironically, as capacity per memory chip increases, there are fewer chips in the
same-sized memory system, reducing chances for innovation. For example, a
2 GB main memory takes 256 memory chips of 64 Mbit (16M × 4 bits), easily
organized into 16 64-bit-wide banks of 16 memory chips. However, it takes only
16 256M × 4-bit memory chips for 2 GB, making one 64-bit-wide bank the limit.
Since computers are often sold and benchmarked with small, standard memory
configurations, manufacturers cannot rely on very large memories to get band-
width. This shrinking number of chips in a standard configuration shrinks the
importance of innovations at the board level.
Hence, memory innovations are now happening inside the DRAM chips
themselves. This section describes the technology inside the memory chips and
those innovative, internal organizations. Before describing the technologies and
options, let’s go over the performance metrics.
Memory latency is traditionally quoted using two measures—access time and
cycle time. Access time is the time between when a read is requested and when
the desired word arrives, while cycle time is the minimum time between requests
5.3 Memory Technology and Optimizations