bound because
O(N/B)
is the number of I/Os we need to read all N vertices.
As typical practical values of B are measured in thousands this difference can
be extremely significant in practice.
In general the above type of behavior is characteristic for internM-memory al-
gorithms when analyzed in an I/O-environment, and the main reason why many
applications experience a dramatic decrease in performance when the problem
instances get larger than the available internal memory. The lesson one should
learn from this is that one should be very careful about following pointers, and
be careful to ensure a high degree of locality in the access to data (what is
normally referred to as
locality of reference).
3.1 Fundamental External-Memory Bounds
After illustrating how simple internal-memory algorithms can have a terrible
performance in an I/O environment, let us review the fundamental theoretical
bounds in the parallel disk model. For simplicity we give all the bounds in the
one-disk model. In the general D-disk model they should all be divided by D.
Initial theoretical work on I/O-complexity was done by Floyd [47] and by
Hong and Kung [56] who studied matrix transposition and fast Fourier trans-
formation in restricted I/O models. The general I/O model was introduced by
Aggarwal and Vitter [6] and the notion of parallel disks was introduced by Vitter
and Shriver [96]. The latter papers also deal with fundamental problems such as
permutation, sorting and matrix transposition, and a number of authors have
considered the difficult problem of sorting optimally on parallel disks [5, 21,
72, 70]. The problem of implementing various classes of permutations has been
addressed in [38, 39, 40]. More recently researchers have moved on to more spe-
cialized problems in the computational geometry [12, 14, 19, 34, 52, 98], graph
theoretical [13, 14, 34, 33, 69, 65] and string processing areas [15, 35, 45, 46].
As already mentioned the number of I/O operations needed to read the en-
tire input is
N/B
and for convenience we call this quotient n. One normally
uses the term
scanning
to describe the fundamental primitive of reading (or
writing) all elements in a set stored contiguously in external memory by reading
(or writing) the blocks of the set in a sequential manner in
O(n)
I/Os. Fur-
thermore, one says that art algorithm uses a linear number of I/O operations
if it uses
O(n)
I/Os. Similarly, we introduce m =
M/B
which is the number
of blocks that fit in internal memory. Aggarwal and Vitter [6] showed that the
number of I/O operations needed to sort N elements is t2(nlog,,~ n), which is
then the external-memory equivalent of the well-known £2(N log 2 N) internal-
memory bound. 1 ~rthermore, they showed that the number of I/Os needed to
rearrange N elements according to a given permutation is £2(min{N, n log,~ n}).
Taking a closer look at the above bounds for typical values of B and M
reveals that because of the large base of the logarithm, log,~ n is less than 3 or
4 for all reMistic values of N and m. Thus in practice the important term is
-x We define log~ n = max{l, log
n~
log m}. For extremely small values of M and B
the comparison model is assumed in the sorting lower bound--see Mso [16, 17]
218