
92 CHAPTER 7. ADAPTIVE SAMPLING METHODS
From this (through a series of computations), we have, for j = 1, . . . , n,
(U
T
k
S
T
SU
k
)β
j
= U
T
k
S
T
Sw
j
Now we choose r large enough so that σ
2
(SU) ≥ 1/
√
2 with probability at least
7/8 and hence,
kA − Dk
2
F
=
n
X
j=1
β
2
j
≤ 2
n
X
i=1
kU
T
k
S
T
Sw
j
k
2
≤ 2
n
X
j=1
kw
j
k
2
= 2
n
X
j=1
kA − A
k
k
2
F
.
Here the penultimate step we used the fact that random projection preserves
inner products approximately, i.e., given that w
j
is orthogonal to U
k
,
|U
T
k
S
T
Sw
j
| ≤
2
kw
j
k
2
.
Exercise 7.4. Let A be an m × n matrix with m > n and A = UΣV
T
be its
SVD. Let b ∈ R
m
. Then the point x
∗
which minimizes kAx − bk is given by
x
∗
= V Σ
−1
U
T
b.
7.4 Discussion
In this chapter we saw asymptotically tight bounds on the number of rows/columns
whose span contains a near-optimal rank-k approximation of a given matrix. We
also saw two different algorithms for obtaining such an approximation efficiently.
Adaptive sampling was introduced in [DRVW06], volume sampling in [DV06]
and isotropic RP in [Sar06].
The existence of such sparse interpolative approximations has a nice applica-
tion to clustering. Given a set of points in R
n
, and integers j, k, the projective
clustering problem asks for a set of j k-dimensional subspaces such that the sum
of squared distances of each point to its nearest subspace is minimized. Other
objective functions, e.g., maximum distance or sum of distances have also been
studied. The interpolative approximation suggests a simple enumerative algo-
rithm: the optimal set of subspaces induce a partition of the point set; for each
part, the subspace is given by the best rank-k approximation of the subset (the
SVD subspace). From the theorems of this chapter, we know that a good ap-
proximation to the latter lies in the span of a small number (k/) of points. So,
we simply enumerate over all subsets of points of this size, choosing j of them at
a time. For each such choice, we have to consider all ”distinct” k-dimensional
subspaces in their span. This can be achieved by a discrete set of subspaces of
exponential size, but only in k and . For each choice of j k-dimensional sub-
spaces we compute the value of the objective function and output the minimum
overall.
It is an open question to implement exact volume sampling efficiently, i.e.,
in time polynomial in both n and k. Another open question is to approximate a
given matrix efficiently (nearly linear time or better) while incurring low error
in the spectral norm.