10.5 BPP, NP, and the Polynomial Hierarchy 139
worthwhile to recall the differences between the underlying randomized algo-
rithms of these two classes:
•
NP algorithms: one-sided error, but large error-probability, e.g. 1 − 2
−n
.
•
BPP algorithms: error-probability severely limited, e.g. by 2
−n
, but two-
sided error.
So it is at least possible that these classes are incomparable with respect to
the subset relation. On the other hand, our intuition is that
BPP is “not much
bigger” than
P, and so the inclusion BPP ⊆ NP would add to our picture of
the complexity landscape without shaking the prevailing hypotheses such as
NP = P. With respect to the polynomial hierarchy, the best known result is
that
BPP ⊆ Σ
2
∩ Π
2
. We will present the proof of this result in such a way
that we can draw further consequences from it.
Theorem 10.5.1.
BPP ⊆ Σ
2
∩ Π
2
.
Proof. Since by definition
BPP = co-BPP, it is sufficient to show that BPP ⊆
Σ
2
. It then follows immediately that BPP = co-BPP ⊆ co-Σ
2
= Π
2
.
So let L ∈
BPP be given. By Theorem 3.3.6 there is a randomized algorithm
for L with polynomial worst-case runtime and an error-probability bounded by
2
−(n+1)
. Furthermore, we can assume that every computation path has length
p(|x|)andthatp(n) is divisible by n. Since in the analysis that follows we will
need the inequality p(n)/n ≤ 2
n
, we will first deal with the at most finitely
many input lengths for which this is not the case. For these finitely many
inputs, a polynomial-time algorithm can simulate the randomized algorithm
on all computation paths and compute the correct result without losing the
property of being a polynomial-time algorithm.
For each input x of length n by our assumptions there are exactly 2
p(n)
computation paths of the BPP algorithm. Because of the small error-rate, only
very few of these, namely at most 2
p(n)−(n+1)
many, can give the wrong result.
For an input x we will let A(x) be the set of computation paths r ∈{0, 1}
p(n)
on which the BPP algorithm accepts, and N(x) the set of remaining paths.
For all x ∈ L, A(x) is much larger than N(x). So for “significantly many”
x ∈ L, there must in fact be a common accepting computation path. On the
other hand, for x/∈ L, the set A(x) is very small. We want to take advantage
of this difference.
Let k(n) be a size to be specified later. We will abbreviate k(n)ask and
p(n)asp in order to simplify the formulas. Let B be the language of all
triples (x, r, z) consisting of an input x ∈{0, 1}
n
for the decision problem
L, k computation paths r
1
,...,r
k
∈{0, 1}
p
, and a so-called computation
path transformation z ∈{0, 1}
p
, for which r
i
⊕ z is in A(x) for at least
one i.Here⊕ stands for the component-wise exclusive or on vectors from
{0, 1}
p
. The function h
z
(r):=r ⊕ z is a bijective function onto the set {0, 1}
p
of computation paths. Since in deterministic polynomial time it is possible
to simulate a randomized algorithm with polynomially-bounded runtime on