A sweep algorithm maintains a queue of events, in this case the given points,
sorted by x-coordinate. Thus, it is easy to continually discard points that trail
the front by a distance > 5. When answering a query, we consider only the
"active" points, i.e. those that lie in a narrow strip of width ~ to the left of the
front. Depending on where the current event p lies on the front, any of the active
points can form a new closest pair with p. We maintain the active points, sorted
by y-coordinate, in a table that permits fast "dictionary operations": find, insert,
delete, predecessor and successor.
This standard data type "dictionary" can be implemented using a dozen
different data structures, resulting in different performance characteristics. We
aim at an algorithm that runs in time
O(ntogn)
in the worst case. Balanced
trees implement dictionary operations with a worst case time bound of O(log k),
where k is the number of entries in the dictionary -- for our sweep algorithm,
k < n. Predecessor and successor require only time O(1). The asymptotic run-
time analysis that follows does not depend on the specific choice of data structure
but only on the existence of data structures that guarantee a time bound of
O(log n) for each dictionary operation.
The current event p is inserted into the table according to its y-coordinate
p.y.
Vertically, p lies in the middle of the 2 ~-range of the rectangular query.
Starting at p, we scan the active points in the table both upwards, applying the
operation "successor" until we exit from the rectangle at coordinate
p.y
+ 5 ; as
well as downwards, applying "predecessor" until we exit from the rectangle at
p.y - 5.
Thus, we have visited each point q in the rectangle and submitted it to
a distance computation
d(p, q).
The total time required for processing an event
p thus adds up to:
(insert
p) +
(~ of points q in the rectangle) × (predecessor or successor + evaluate
d(p, q)).
Insert requires time O(log n); predecessor, successor and evaluate
d(p, q)
need
time O(1). The number of points in the rectangle is < 5. Thus, processing an
event is dominated by the insertion of p into the dictionary and takes time
O(Iogn). With a total of n events we have found a simple, efficient
O(nlogn)
algorithm for the problem of the closest pair. Assuming that standard operations
such as sorting and dictionary access can be obtained from a program library,
this sweep algorithm takes at most one additional page of source code.
The success of sweep algorithms is due to a clever trick. By treating the x-a~is
as a time-dimension, a 2-d problem is reduced to a sequence of 1-d problems. For
1-d data, i.e. totally ordered domains, we know various efficient data structures.
Thanks to these, sweep algorithms succeed in processing n 1-d problems, each of
which may involve n data items, in time
O(ntogn)
rather than in time O(n2).
Unfortunately, this idea does not generalize efficiently to higher dimensions, tf
we sweep 3-d space with a plane, we transform a 3-d problem into a sequence of
2-d problems. The latter generates queries in a 2-d data space, and for these we
rarely have data structures that answer queries in logarithmic time. We leave
the discussion of spatial data structures to the survey in Chapter
7.