Mayr E.W., Pr?mel H.J., Steger A. (eds.) Lectures on Proof Verification and Approximation Algorithms

Подождите немного. Документ загружается.

42 Chapter 3. Derandomization

This chapter is organized as follows. We start with the method of conditional

probabilities. First we describe the method in an abstract way. Then we apply

this method to several problems. We start with the derandomization of the algo-

rithms for MAXEkSAT, MAXSAT and MAxLINEQ3-2 given in Chapter 2. Then

we derandomize the randomized rounding approach for approximating linear

integer programs. In the second part of this chapter we present a method for re-

ducing the size of the probability space. We apply this method to two examples.

First we show how to derandomize the randomized algorithm for MAXE3SAT.

Then we describe a randomized parallel algorithm for the problem of computing

maximal independent sets. By derandomizing this algorithm we obtain an Arc 2

algorithm for this problem.

3.2 The Method of Conditional Probabilities

In this section we describe the method of conditional probabilities in a general

way. Applications of this method to specific randomized algorithms can be found

in the following sections. We follow the presentations of the method of conditional

probabilities of Raghavan [Rag88] and Spencer [Spe94]. In this section we only

consider some maximization problem. The adaptation to minimization problems

is straightforward.

Let us consider some maximization problem II for which a randomized poly-

nomial time algorithm A is known. Let x denote the input for A. The random

choices of the algorithm may be described by random bits Yl,.-., Yn where Yi

takes the value 1 with probability Pi and the value 0 with probability 1 -pi. We

assume that the output of A is a legal solution for the maximization problem H

on input x. The value of this output is described by the random variable Z. Our

goal is to construct a deterministic polynomial time algorithm which computes

an output with value of at least E[Z].

If we fix the random bits yl,... ,Y,~ to constants dl,... ,dn, we may consider A

as a deterministic algorithm. This algorithm performs the steps that are done by

A if the random bits are equal to dl,..., dn. The problem is to choose dl,..., dn

deterministically.

We remark that there is always a choice for dl,..., dn so that the output of

this deterministic algorithm is at least E[Z]. This follows from a simple average

argument. If for all choices dl,..., dn the output of A is smaller than some

number C, then also the weighted mean of these outputs, i.e. E[Z], is smaller

than C. Hence, it is impossible that the output of A is smaller than E[Z] for all

dl,...,dn.

The derandomized algorithm chooses the bits dl,..., dn sequentially in the or-

der dl,..., dn. The choice of di+l depends on dl,..., di that are chosen before.

We may consider a choice dl,...,di of yl,...,yi as an intermediate state of

3.2. The Method of Conditional Probabilities 43

the derandomization procedure. With each such state we associate a weight w.

The weight W(dl,... ,d i) is defined as the conditional expectation E[Z[yl -=

dl,..., Yi -- di]. In other words, we consider a modification of A where only

Yi+t,.-., Y, are chosen randomly with probabilities Pi+l,...,Pn and where the

bits YI,.--, Yi have been fixed to dl,..., di. Then the weight

w(dl,..., di)

is the

expected output of this modified algorithm.

In particular, we have w 0 = E[Z]. Furthermore,

w(dl,...,dn)

= E[Z[yl =

dl,...,yn = dn] is the output of A when all bits Yl,-..,Yn are fixed, i.e. the

output of a deterministic algorithm.

The derandomization by the method of conditional probabilities is based on the

assumption that the weights

w(dl,...,

di) can be computed in polynomial time.

If this is not possible, one may try to estimate the weights. This method is called

method of pessimistic estimators. An example of an application of this method

is presented in Section 3.4.

Now we can give an outline of the derandomized algorithm.

for i := 1 to n do

if w(dt,... ,di-l,0) > w(dt,..., di-1,1)

then di :-- 0

else di :-= 1

Run the algorithm A where the random choices are replaced by dl,..., d, and

output the result of A.

In order to prove that the output of A for this choice of dl,. 9 dn is at least E[Z]

we consider the relation between W(dl,..., di) and the weights w(dl,..., di, 0)

and W(dl,..., di,

1). A simple manipulation yields

w(dl,...,di) = E[Zlyl = dl,...,yi = di]

: (1

-p{+I)E[Z[yl =- dl,... ,Yi = di,yi+l =-

13]

+Pi+x E[Z[yl

= dl,... ,Yi = di, yi+l =

= (1

-pi+l)w(dl,...,di,O) +pi+lW(dl,...,di,1).

(3.1)

This equality should be also intuitively clear. It says that after choosing yl =

dl,..., y~ = d~ the following two procedures yield the same result:

1. We randomly choose yi+l,..., Yn and compute the expectation of the output.

2. First we choose Yi+x = 0 and afterwards

Yi+2,..., Yn

randomly and compute

the expectation of the output. Then we choose yi+l = 1 and afterwards

Yi+2,.-., yn randomly and compute the expectation of the output. Finally,

we compute the weighted mean of both expectations.

From (3.1) we conclude

w(dl,...,

di) <~ max{w(dl,..., di, 0), w(dl,..., di, 1)},

(3.2)

44 Chapter 3. Derandomization

since the weighted mean of two numbers cannot be larger than their maximum.

If dl,..., dn are the bits computed by the derandomized algorithm, we obtain

E[Z] = w0 ~<

w(dl) <<. w(dl,d2) <<.... <<. W(dl,...,dn).

Since

w(dl,...,dn)

is the value of the output of the derandomized algorithm,

we have proved the following lemma.

Lemma 3.1.

If all weights

W(dl,...,di)

can be computed in polynomial time,

then the derandomized algorithm computes deterministically in polynomial time

a solution of II with a value of at least

E[Z].

3.3 Approximation Algorithms for MAxEkSAT, MAXSAT

and MAxLINEQ3-2

We start with the problem MAxEkSAT. An instance of MAXEkSAT consists of

the set {Xl,...,

xn}

of variables and the set C -- {C1,...,

Cm}

of clauses. In the

algorithm RANDOM EkSAT ASSIGNMENT in Chapter 2 each variable Xl,..., xn

is independently chosen to be 0 and 1 with probability 1/2 each. With each

clause C~ we associate an indicator variable It that takes the value 1 iff the

clause C~ is satisfied. A clause consisting of k literals is satisfied with probability

(2 k - 1)/2 k. Hence, E[I~] = (2 k - 1)/2 k and the expected number of satisfied

clauses is E[Z] = E[I1 + ... + Ira] = m 9 (2 k - 1)/2 ~. Hence, the expected

performance ratio of this algorithm is bounded by

2k/(2 k --

1).

In order to derandomize this algorithm we choose an appropriate weight function

and show that this weight function can be computed in polynomial time. Let Z

be the random variable describing the number of satisfied clauses. As in the last

section we choose

w(dl,...,di) := E[Zlxl = dl,...,xi =- di].

We show how to compute the weight

w(dl,..., di)

in polynomial time. In each

clause we replace each occurrence of xj, j E {1,... ,i}, by the constant d 3 and

simplify the resulting clause. For each clause Ct there are the following possibil-

ities.

1. After replacing Xl,... ,xi and simplifying we have Cl = 0. Then

E[Illxl =

dl,...,xi

= di] = Prob[Ii = llXl = dl,...,xi = di] = 0.

2. After replacing xl,... ,xi and simplifying we have Cl = 1. Then E[I~lXl =

dl,...,x~ = d~] = Prob[It = llxt = dl,...,xi = di] = 1.

3. After replacing Xl,..., xi and simplifying we obtain a clause containing h

variables. Then E[ItlXl = dl,... ,xi = di] = Prob[I~ = llxl = dl,... ,xi =

d~] = 1 - 1/2 h.

3.3. MAxEkSAT, MAXSAT and MAXLINEQ3-2 45

Hence, we can compute

w(dl,...,di)

= E[Z[xl = dl,...,xi =

di] = ZE[Ielxl = dl,...,xi = di].

t----1

It is clear that the replacements, the simplification of the clauses and the compu-

tation of the weights can be done in polynomial time. By combining the above

definition of the weights and the algorithm from Section 3.2 we obtain a de-

terministic polynomial time approximation algorithm for MAXEkSAT with a

performance ratio bounded by

2k/(2 k --

1). In particular, we obtain a polyno-

mial time approximation algorithm for MAXE3SAT with a performance ratio of

at most 8/7. This algorithm was already presented by Johnson [Joh74].

Theorem

3.2.

There is a deterministic polynomial time approximation algo-

rithm for

MAxEkSAT

with a performance ratio of at most

1 + 1/(2 k - 1).

In Chapter 2 a randomized approximation algorithm for MAXSAT with an ex-

pected performance ratio of 4/3 is presented. It consists of two algorithms where

the better output is chosen. The first algorithm is the algorithm RANDOM EkSAT

ASSIGNMENT for which we already presented a derandomized version. The sec-

ond algorithm is based on a randomized rounding approach. The instance is

transformed into a linear integer program. By solving the linear relaxation of

the integer program probabilities Pl,...,

Pn are

computed and for j E {1,..., n},

xj = 1 is chosen with probability pj. In Chapter 2 it is shown that the com-

bination of both algorithms yields an expected performance ratio of at most

4/a.

Also the second algorithm can be derandomized in the same way. We choose

the same weight function. For the computation of the weights we consider the

three cases as above. The first two cases remain unchanged. Now consider some

clause consisting of h literals after replacing xl,..., xi and simplifying. Without

loss of generality these are h nonnegated literals xj0),... ,

Xj(h).

Then

satisfied with probability 1 - (1 -Pj(1))"..-" (1 -Pj(h)), i.e., we have E[Itlxl =

d~,...,xi = di]

= 1 - (1 -pj(~)) ....-(1 -Pj(h)). We may combine the two

derandomized algorithms and choose the better result. By the analysis of the

randomized algorithm for MAXSAT the following theorem follows.

Theorem 3.3.

There is a deterministic polynomial time approximation algo-

rithm for

MAXSAT

with a performance ratio of at most

4/3.

The algorithm RANDOM MAxLINEQ3-2 ASSIGNMENT from Chapter 2 is de-

randomized in the same way. An instance of MAXLINEQ3-2 consists of the set

{xl,..., x,~} of variables and the set C = {C1,...,

Ca}

of equations. As proved

in Chapter 2 choosing the variables randomly with probability 1/2 for 0 and

probability 1/2 for 1 yields a randomized approximation algorithm with an ex-

pected performance ratio of at most 2 for MAXLINEQ3-2.

46 Chapter 3. Derandomization

Let I~ be the indicator variable describing whether equation C~ is satisfied. For

the derandomization we choose the same weight function as for MAXE3SAT. In

order to compute the weight

w(dl,...,di)

we replace xj, j e {1,...,i}, by dj

and simplify the equations. Now we may have the following cases.

1. After simplification equation

does not contain any variable and it is not

satisfied. Then E[Ii[xl = dl,... ,xi = di] = 0.

2. After simplification equation Ct does not contain any variable and it is

satisfied. Then

E[Itlxl = dl,..., xi = di] = 1.

3. After simplification equation C~ contains some variable. Then

E[Illxl =

dl,...,x~ = di] = 1/2.

Again we obtain the weight

w(dl,...

,d i) by adding the conditional expecta-

tions of the indicator variables. Hence, derandomization yields a deterministic

approximation algorithm for MAXLINEQ3-2.

Theorem

3.4.

There is a deterministic polynomial time approximation algo-

rithm ]or

MAXLINEQ3-2

with a performance ratio of at most 2.

3.4 Approximation Algorithms for Linear Integer

Programs

In this section we present the derandomization of the randomized rounding ap-

proach for approximating linear integer programs due to Raghavan [Rag88]. We

start with the problem

LATTICEAPPROXIMATION,

which is a subproblem for ap-

proximating certain linear integer programs. First we explain how to apply the

randomized rounding approach to

LATTICEAPPROXIMATION.

Then we argue as

in the previous sections. By a probabilistic and non-constructive proof we show

that there are good solutions for LATTICEAPPROXIMATION. In a second step the

probabilistic proof is derandomized by the method of conditional probabilities.

Afterwards we apply the techniques developed for

LATTICEAPPROXIMATION

a more practical problem, namely

VECTORSELECTION.

3.4.1 LATTICEAPPROXIMATION

We start with the definition of

LATTICEAPPROXIMATION.

LATTICEAPPROXIMATION

Instance:

An n • r-matrix C, where cij E [0, 1], a vector p E II{ r.

Problem:

Find an integer vector (lattice point) q = (ql,...,qr) so that every

coordinate of C- (p - q) has a "small" absolute value.

3.4. Approximation Algorithms for Linear Integer Programs 47

One may consider p as a solution of the linear relaxation of an integer program.

The rows of C describe the constraints of the integer program. The goal is to

find an integer vector q that approximates p "well". No constraint is violated

too much since we require the absolute values of the inner products of p - q and

each row of C to be small.

In the following the absolute values of the inner products are called discrepancies

Ai. This means Ai = I Y~'-I cij (pj-qj)l. Furthermore, let si = ~~=1 ci3pj. Then

the goal is to choose q so that all discrepancies Ai are bounded by some function

in si. Without loss of generality the reals pj are assumed to be in the interval

[0, 1]. Otherwise one may replace pj by pj - [pj].

We consider only those solutions for q, where each qj is chosen from {0, 1} by

randomized rounding of pj. This means that we choose qj = 1 with probability

pj and qj = 0 with probability 1 - pj. The choice of the coordinates of q is done

mutually independently. Then each qj is a Bernoulli trim and E[qj] = pj. We

define the random variables g~i by

~i = ~ cijqj.

j=l

Then we have E[Oi] = ~=1 cij E[qj] = si. Since Ai = Ig~i - si], it is intuitive

that randomized rounding may lead to small Ai.

Our first goal is a probabilistic proof that there is some vector q so that all dis-

crepancies Ai are small. The random variables ~i are weighted sums of Bernoulli

trims. We shall apply bounds on the tail of the distribution of the weighted sum

of Bernoulli trims, which we now derive.

Let al,..., ar E (0, 1] and let X1,..., Xr be independent Bernoulli trims with

E[Xj] = p~. Let X = ~=x aiXi. The mean of X is m := E[X] = ~j=l ajp~.

The following theorem gives Chernoff bounds on the deviation of X above and

below its mean.

Theorem

3.5. Let ~ > 0 and let m = E[X] >/0. Then

1. Prob[X > (1+ 6)m] < ~) ,

ProbtX < (1- )ml <

Proof. We only prove the first statement. The second statement can be proved

in a similar way. We apply Markov's inequality to the moment generating func-

tion e tx and obtain for each t > 0:

Prob[X > (1 + $)m] = Prob[e tx

> e t(l+6)m] < e -t(l+8)m

E[etX].

Now we exploit the independence of the Xj.

48 Chapter 3. Derandomization

e-tO+~) m E[e tx] = e-tO+~)m E[e~,lxl . . . . . eta~x.]

= e-~(l+~)mE[e~olx~].....E[~o~xq

= e-t(l+~)'~H~jeta' +(1-pj)l]. (3.3)

j-----1

Now we may choose t = In(1 + 5) and obtain

r r

(1 + 5)-(1+~)~ 1] [PJ (1 + 5)~ + 1 - pj] <. (1 + 5)-(1+~)~ 1] exp[v5 [(1 + 5) ~ - 1]].

j=l 5----1

The inequality follows from x + 1 ~ e ~. Since a 5 9 (0, 1] we have (1 + 5) ~ - 1 ~<

5a 3. Hence, the last expression is bounded by

(1 + 5) -(1+6)m exp 5aSP5 = (1 +-~)1+6

Let B(m, 5) be the bound of Theorem 3.5 on the probability that X is larger

than (1 + 5)m, i.e.

B(m, 5) = [e6/(1 + 5)1+~] m.

Due to the symmetry of Theorem 3.5 the same bound holds for deviations below

the mean. We define D(m,x) as the deviation 5 for which the bound B(m, 5)

is x, i.e., the deviation for which B(m, D(m, x)) = x. In [Rag88] the following

estimates for D(m, x) are given:

Case 1. m > In 1/x. Then

D(m, x) <. (e - 1) (3.4)

Case 2. m <~ In 1/x. Then

e In 1/x (3.5)

D(m, x) <~ m ln[(e In 1/x)/m]"

Now we are ready to prove the existence of an approximation vector q for which

the discrepancies are "small".

Theorem 3.6. There is a O-l-vector q so that for all i E {1,..., n} it holds that

A, <~ siD(si, 1/2n).

3.4. Approximation Algorithms for Linear Integer Programs 49

Pro@

Let qj E {0, 1} be chosen by randomized rounding. It suffices to prove

that the resulting vector satisfies the bound of the theorem with positive prob-

ability. Let/~i be the (bad) event that Ai exceeds the bound of the theorem. By

definition of

D(m, 5)

we have

Prob[~

> si + siD(si,

1/2n)] <

1/2n

and

Prob[r

< si - siD(si,

1/2n)] <

1/2n.

Hence, the probability of each bad event is less than

1/n

and the probability

that any of the n bad events occurs is less than one. This implies that there is

some vector q for which no bad event occurs. This vector q satisfies the bounds

of the theorem. 9

Now we derandomize the randomized rounding approach in order to obtain an

efficient deterministic algorithm that achieves approximately the bound of The-

orem 3.6. We use the same algorithm as in Section 3.2. The problem is to find

a suitable weight function. The proof of Theorem 3.6 suggests to define the

weight of a choice dl,..., dj of the first j bits of q as the conditional probabil-

ity that a bad event occurs given ql = dl,...,qj = dj. (Only qj+l,...,qr are

chosen randomly using randomized rounding.) Let us denote this probability by

P(dl,..., d 3).

Then by Theorem 3.6 we have P0 < 1. By the total probability

theorem

P(dl,..., dj) =

(1 -

pj+l)P(dl,..., dj, O) + pj+lP(dl,..., dj,

1).

This implies that for all dl,... ,dj E {0, 1} it holds that

P(dl,..., dj) >/min{P(dl,...,

dj, 0),

P(dl,..., dj,

1)}.

Hence, we may choose for qj+l that dj+l leading to a smaller probability of a

bad event. If we choose all

in such a way, then we have as in Section 3.2

1 > PO >1 P(dl) >1... >/P(dl,...,dr).

This implies

P(dl,...

,dr) = 0, since no random choices are done. We obtain a

vector d = (dl,..., dr) satisfying the bound of Theorem 3.6.

The problem is that it is not clear how to compute

P(dl,... ,dj)

efficiently.

Hence, we use a different weight function U with the following properties:

1. U 0 < 1.

2. U(dl,... ,dj) >1

min{U(dl,... ,dj,0),

U(dl,... ,dj,

1)}.

3. U is an upper bound on P.

4. U can be computed in polynomial time.

50 Chapter 3. Derandomization

The first two properties ensure that we can use U as a weight function. By the

same inequality as for P we obtain U(dl,..., dr) < 1. Since U is an upper bound

on P, this implies

P(dl,..., dr) = O.

Hence, the resulting vector (dl,..., dr) sat-

isfies the bound of Theorem 3.6. Because of the last property the derandomized

algorithm runs in polynomial time. Since the probability of failure is bounded

by U, this method is called the method of pessimistic estimators. It remains to

find a suitable function U.

We derive the function U based on the proofs of Theorem 3.5 and Theorem 3.6.

In the proof of Theorem 3.6 the bad event/3/was defined as the event that q~i

exceeds either of the limits of Theorem 3.6. Let us denote the limits on ~i by

Li+

and

Li-,

i.e. L~+ =

si(1 + D(si,

t/2n)) and Li- =

si(1 - D(si,

1/2n)).

By equation (3.3) for each

> 0 the probability that ~i becomes larger than

Li+

is bounded by

Prob[~i >

Li+] < e -tiLl+ E[e tic'~q']

-= e -tIL'+ H~gjetlclJ

-t-

1 -pj]. (3.6)

j=l j=l

In a similar way it can be shown that for each ti > 0 the probability that r

becomes smaller than

Li-

is bounded by

Prob[qJi <

Li-] < e t'L'-

II[pje

+ 1 - pj]. (3.7)

j=l

We sum up the estimates of equations (3.6) and (3.7) in order to obtain a bound

on the probability that any of the qJ~ exceeds either of its limits. We define U 0

as this sum.

]

u0 :=

e H[pje ''c',

1-pjl+e t'L'-

II se -',~ +

1-p5]

9

i=1 5=1 j=l

(3.8)

We may choose

= ln[1 +

D(si,

1/2n)]. For this choice of

the proofs of

Theorem 3.5 and Theorem 3.6 were done. Hence, these proofs imply U 0 < 1. But

we have the problem that the definition of D(-, -) does not show how to compute

D(si,

1/2n) in polynomial time. For this reason we have to replace

D(si,

1/2n)

by some upper bound

D*(si,

1/2n). Such an upper bound may be obtained, e.g.,

by the estimates (3.4) and (3.5). By replacing

D(si,

1/2n) by an upper bound

the allowed deviation of ~i becomes larger and, therefore, the probability of

a bad event cannot become larger. Hence, also after this replacement we have

U0 < 1.

Equation (3.8) only shows how to compute U 0. Now we discuss how to compute

U(dl,..., d~).

First let us consider the case that only one random variable

was assigned the value

= 1. Then the probability that ~i is larger than

Li+

is equal to the probability that the sum of the random variables

qj,

where j # k,

is larger than

Li+ - cir.

This probability is bounded above by

3.4. Approximation Algorithms for Linear Integer Programs 51

e-t'(L'+-c'~) H E[et'C'Jq~]=e-t'L'+et'c'~ H ~Jet'C'r + 1--pj.

je{1 .....

j#k j#k

Hence, we have replaced in (3.6) the term E[e t'c~qk] = pke t'c'~ + 1 --Pk by e tic'k .

In other words, we have replaced the mean of e t{c{~q~ by the value that this term

takes for qk : 1. Similarly, we obtain a bound on the probability for 9i > Li+

given qk : 0 by replacing the term E[e t{c'hq~] = pke tlci~ + 1 -- Pk by 1. By

performing the corresponding changes for the assignments ql = dl,..., qt -- dt

on formula (3.8) we obtain a formula for U(dl,..., dr).

In order to show that the second condition from above is fulfilled for U, Raghavan

[Rag88] shows that U(dl,..., dJ is a convex combination of U(dl,..., dj, 0) and

U(dl,..., dj, 1). We omit these computations. By the derivation of U it follows

that U is an upper bound on P. Altogether, we derandomized the randomized

rounding approach for LATTICEAPPROXIMATION.

Theorem 3.7. By the method o] pessimistic estimators a O-l-vector q satisfy-

ing Vi E {1,...,n] : Ai ~< siD*(si,1/2n) can be computed deterministically in

polynomial time.

3.4.2 An Approximation Algorithm for VECTORSELECTION

We apply the techniques developed for

LATTICEAPPROXIMATION

to the problem

VECTORSELECTION. We start with the definition of the problem.

VECTORSELECTION

Instance: A collection A = {A1,...,Ar} of sets of vectors in {0, 1} '~, where

Problem: Select from each set Aj exactly one vector V j so that [I ~-~'=1 VJlI~

is minimum.

The problem VECTORSELECTION has applications for routing problems [Rag88]

and it is known to be JV'7~-hard. In order to apply the techniques developed above

we formulate VECTORSELECTION as a linear 0-1-program. Let x~ (1 <~ j <~ r, 1 ~<

k ~< kJ be an indicator variable that describes whether V~ is selected. In the

following 0-1-program the constraints (3.10) ensure that exactly one vector from

each set is chosen, and the constraints (3.11) ensure that the c~-norm of the sum

of selected vectors, i.e. the maximum of the coordinates of this sum, is smaller

than W, where the goal is to minimize W. The constraints are

x~ e {0,1},

Ex~=

k=l

r kj

E E w,

j----1 k-~l

1 ~< j ~ r, 1 ~ k ~< kj, (3.9)

1 ~< j ~< r, (3.10)

1 <~ i ~< n. (3.11)