
42 CHAPTER 4. RECURSIVE SPECTRAL CLUSTERING
1. The conductance of each C
i
is at least α.
2. The total weight of inter-cluster edges is at most an fraction of the total
edge weight.
Associated with this bicriteria measure is the following optimization prob-
lem: (P1) Given α, find an (α, )-clustering that minimizes (alternatively, we
have (P2) Given , find an (α, )-clustering that maximizes α). We note that
the number of clusters is not restricted.
4.3 Approximation Algorithms
Problem (P1) is NP-hard. To see this, consider maximizing α with set to
zero. This problem is equivalent to finding the conductance of a given graph
which is well known to be NP-hard [GJ79]. We consider the following heuristic
approach.
Algorithm: Recursive-Cluster
1. Find a cut that approximates the minimum conductance cut
in G.
2. If the conductance of the cut obtained is below a preset
threshold, recurse on the pieces induced by the cut.
The idea behind our algorithm is simple. Given G, find a cut (S,
¯
S) of
minimum conductance. Then recurse on the subgraphs induced by S and
¯
S.
Finding a cut of minimum conductance is hard, and hence we need to use an
approximately minimum cut. There are two well-known approximations for
the minimum conductance cut, one is based on a semidefinite programming
relaxation (and precurson on a linear programming relaxation) and the other
is derived from the second eigenvector of the graph. Before we discuss these
approximations, we present a general theorem that captures both for the purpose
of analyzing the clustering heuristic.
Let A be an approximation algorithm that produces a cut of conductance at
most Kx
ν
if the minimum conductance is x, where K is independent of x (K
could be a function of n, for example) and ν is a fixed constant between between
0 and 1. The following theorem provides a guarantee for the approximate-cluster
algorithm using A as a subroutine.
Theorem 4.3. If G has an (α, )-clustering, then the recursive-cluster algo-
rithm, using approximation algorithm A as a subroutine, will find a clustering
of quality
α
6K log
n
1/ν
, (12K + 2)
ν
log
n
!
.