Desurvire E. Classical and Quantum Information Theory: An Introduction for the Telecom Scientist

Подождите немного. Документ загружается.

Appendix K (Chapter 13) Capacity of

binary communication channels

In this appendix, I provide the solution of the maximization problem for mutual infor-

mation, which deﬁnes the channel capacity:

C = max

p(x)

H(X ; Y ), (K1)

as applicable to the general case of a binary communication channel (symmetric or

asymmetric), whose transition matrix is deﬁned according to

P(Y |X) =



p(y

) p(y

)

p(y

) p(y

)





a 1 −b

1 − ab



(K2)

where a, b are real numbers in the interval [0, 1]. To recall, the channel mutual infor-

mation H (X; Y )isgivenby

H(X ; Y ) = H (Y ) − H(Y |X ). (K3)

Thus, we must ﬁrst calculate the output-source entropy H (Y ) and the equivocation

entropy H (Y |X ). For this, we need the output probability distribution p(y

), p(y

which is obtained from the transition-matrix elements in Eq. (K2) as follows:

p(y

) = p(y

)p(x

) + p(y

)p(x

)

= aq +(1 −b)(1 −q)

p(y

) = p(y

)p(x

) + p(y

)p(x

)

= (1 −a)q + b(1 − q)

≡ 1 − p(y

(K4)

In Eq. (K4), we have deﬁned the input probability distribution according to p(x

) = q

and p(x

) = 1 − q.FromEq.(K4), we can calculate the output-source entropy:

H(Y ) =−p(y

)logp(y

) − p(y

)logp(y

)

=−p(y

)logp(y

) − [1 − p(y

)] log[1 − p(y

)]

= f [ p(y

)]

= f [aq + (1 −b)(1 − q)],

(K5)

618 Appendix K

where the function f is deﬁned by

f (u) = f (1 − u)

=−u log u − (1 −u)log(1− u),

(K6)

noting that, as usual, the logarithms are in base two.

The next step consists of calculating the equivocation H (Y |X ). For this, we need

the joint distribution p(x, y). Using Bayes’s theorem and the conditional probabilities

shown in Eq. (K2) we obtain:











p(y

, x

) = p(y

)p(x

) = aq

p(y

, x

) = p(y

)p(x

) = (1 − b)(1 − q)

p(y

, x

) = p(y

)p(x

) = (1 − a)q

p(y

, x

) = p(y

)p(x

) = b(1 −q).

(K7)

The results in Eq. (K5) now make it possible to calculate the equivocation H (Y |X):

H(Y |X ) =−



i=1



j=1

p(x

, y

)logp(y

)

=−[aq log a + (1 −a)q log(1 − a)

+ (1 − b)(1 − q)log(1− b) +b(1 − q)logb]

≡ qf(a) +(1 −q) f (b).

(K8)

Substituting Eqs. (K7) and (K8)inEq.(K3), we obtain the mutual information H(X; Y ):

H(X ; Y ) = f [aq + (1 −b)(1 − q)] −qf(a) −(1 − q) f (b). (K9)

Setting q = 0orq = 1 in the result in Eq. (K9) yields, in both cases, H(X ; Y ) = 0. The

case a = 1 −b (or a + b = 1) corresponds to H(X ; Y ) = 0, as can easily be veriﬁed

from Eq. (K9). This means that, regardless of the input probability distribution, the

mutual information is zero. As discussed in the main text, such a channel is useless.

The condition a = 1 − b in the transition matrix (Eq. (K2)) represents the most general

deﬁnition of useless channels, i.e., including but not limited to the case a = b = ε = 1/2.

In the general case, we have H (X ; Y ) ≥ 0, and the function is concave (meaning that

a cord between any two points is always below the maximum). Therefore, the maximum

of H(X ; Y ) ≥ 0 is given by the root of the derivative dH(X ; Y )/

q. Thus, we must

solve

dH (X; Y )

{

f [aq + (1 −b)(1 − q)] −qf(a) − (1 −q) f (b)

}

= 0.

(K10)

Using the deﬁnition of f in Eq. (K8) and going through elementary calculations yields

the following solution for q:

q =

a + b − 1



b −1 +

1 + 2



, (K11)

Capacity of binary communication channels 619

with

W =

f (a) − f (b)

a + b − 1

. (K12)

Concerning the continuity of the above solution in the case a + b = 1, see the note at

the end of this appendix.

The optimal distribution deﬁned in Eqs. (K11) and (K12) can now be substituted into

Eq. (K10). After elementary calculation, we obtain:

C = max

H(X ; Y )

= log(1 + 2

) − W

1 + 2

−q[ f (b) − f (a)] − f (b)

= log(1 + 2

) +

(1 − a) f (b) − bf(a)

a + b − 1

(K13)

Deﬁne

U =

(1 − a) f (b) − bf(a)

a + b − 1

(K14)

and substitute U in Eq. (K13) to get the channel capacity:

C = log(1 + 2

) + U

= log[(1 + 2

]

= log(2

+ 2

U+W

)

≡ log(2

+ 2

(K15)

with

V = U + W =

(1 − b) f (a) − af(b)

a + b − 1

. (K16)

Note

The optimal probability distribution q = p(x

) = 1 − p(x

) deﬁned in Eqs. (K11) and

(K12) seemingly has a pole in a + b = 1. I show herewith that it is actually deﬁned over

the whole plane a, b ∈ [0, 1], namely, that it is analytically deﬁned in the limiting case

a + b = 1. I shall establish this by ﬁrst setting a + b = 1 −ε in Eq. (K11), then using

the Taylor expansion of the function f (a + ε) to the second order, i.e.,

f (a + ε) = f (a) + ε log

1 − a

+ ε

2a(1 −a)

which gives

W = log

1 − a

+ ε

2a(1 −a)

By substituting this result in Eq. (K11), and expanding the exponential term, one

easily ﬁnds 1/(1 + 2

) ≈ a − ε/2, which gives the limit q = 1/2, corresponding to

620 Appendix K

the uniform distribution. However, such a distribution does not maximize the mutual

information H (Y ; X), since we have seen that in the limiting case a + b = 1wehave

H(Y ; X) = 0.

I show next that the channel capacity is also deﬁned over the plane a, b ∈ [0, 1],

including the limiting case a + b = 1. Indeed, using the same Taylor expansion as

previously, we easily obtain

U = (1 −a)log

1 − a

− f (a) +

= log a +

≈ log a.

Substituting this result and 1/(1 + 2

) ≈ a − ε/2 ≈ a into Eq. (K15) yields C =

log(1 +2

) + U ≈−log a + log a = 0, which is the expected channel capacity in the

limiting case a + b = 1. The function V is also found to take the limit V ≈ log(1 − a),

which, from Eq. (K15), also gives C = log(2

+ 2

) ≈ 0.

Appendix L (Chapter 13) Converse

proof of the channel coding theorem

This appendix provides the converse proof of the CCT.

The converse proof must show

that to achieve transmission with arbitrary level accuracy (or error probability), the

condition R ≤ C must be fulﬁlled. The demonstration seeks to establish two properties,

which I shall name A and B.

Property A

To begin with, we must establish the following property, referred to as Fano’s inequality,

according to which:

H(X

) ≤ 1 + p

nR, (L1)

where p

is the error probability of the code ( p

= 1 −

p), i.e., the probability that the

code will output a codeword that is different from the input message codeword.

To demonstrate Fano’s inequality, we deﬁne W = 1 ...2

as the integer that labels

the 2

possible codewords in the input-message codebook. We can view the code

as generating an output integer label W



, to which the label W of the input message

codeword may or may not correspond. We can, thus, write p

= p(W



= W ). We then

deﬁne a binary random variable E, which tells whether or not a codeword error occurred:

E = 1, if W



= W , and E = 0, if W



= W .Thus,wehave p(E = 1) = p

and p(E =

0) = 1 − p

. We can then introduce the conditional entropy H (E, W |Y

), which is the

average information we have on E, W , given the knowledge of the output codeword

source, Y

. Referring back to the chain rule in Eqs. (5.22) and (5.23), we can expand

H(E, W |Y

) in two different ways:

H(E, W |Y

) = H (E|Y

) + H (W |E , Y

)

= H (W |Y

) + H (E |W, Y

(L2)

Since E is given by the combined knowledge of label W and output source Y

,we

have H (E|W, Y

) = 0. We also have H (E|Y

) = H (E), since the only knowledge of

does not condition the knowledge of E. Substituting these results into Eq. (L2), we

Inspired from M. Cover and J. A. Thomas, Elements of Information Theory (New York: John Wiley & Sons,

1991), pp. 203–9.

622 Appendix L

obtain

H(W |Y

) = H (E) + H(W |E, Y

). (L3)

We shall now ﬁnd an upper bound for H (W |Y

). First, we can decompose the second

term in the right-hand side in Eq. (L3)asfollows:

H(W |E, Y

) = p(E = 0)H(W |Y

, E = 0) + p(E = 1)H(W |Y

, E = 1)

= (1 − p

)H (W |Y

, E = 0) + p

H(W |Y

, E = 1).

(L4)

We have H(W |Y

, E = 0) = 0, since knowing Y

and that there is no codeword error is

equivalent to knowing the input codeword label W . Second, we have H(W |Y

, E = 1) ≤

log(2

− 1) < log(2

) = nR, since knowing Y

and that there is a codeword error,

the number of mistaken codeword possibilities is 2

− 1, with uniform probability

q = 1/(2

− 1); hence the entropy can be upper bounded by H



=−log(q) < nR.

Finally, we have H (E) ≤ 1, since E is a binary random variable. Substituting the two

upper bounds into Eq. (L3) yields:

H(W |Y

) ≤ 1 + p

nR (L4)

and

H(X

) ≤ 1 + p

nR, (L5)

since the knowledge of the input message source X

and the codeword label W are

equivalent. Equation (L5) is Fano’s inequality.

Property B

The second tool required for the converse proof of the CCT is the property according

to which the channel capacity per transmission is not increased by passing through

the data several times. Note that this property is true if one assumes that the channel

is memoryless, namely that there is no possible correlation between errors concerning

successive bits or successive codewords.

As I established earlier, the channel capacity for an n-bit codeword is nC, where C

is the capacity of the binary channel, corresponding to a single bit transmission. The

proposed new property can be restated as

H(X

; Y

) ≤ nC. (L6)

The corresponding proof of Eq. (L6) is relatively straightforward. Indeed, we have, by

deﬁnition

H(X

; Y

) = H (Y

) − H (Y

). (L7)

Converse proof of the channel coding theorem 623

Let us now develop the second term in the right-hand side in Eq. (L7)asfollows:

H(Y

) = H(y

) + H (y

, X

) +···+H(y

, y

,...,y

n−1

, X

)



i=1

H(y

, y

,...y

i−1

, X

)



i=1

H(y

(L8)

The ﬁrst equality stems from substituting the extended output source Y

= y

, Y

n−1

= y

, Y

n−2

, etc., (here y

means the binary source of bit or rank k in the codeword),

into the chain rule (Eq. (5.22)) as follows:

H(Y

) ≡ H(y

, Y

n−1

)

= H (y

) + H (Y

n−1

, X

)

= H (y

) + H (y

n−2

, X

)

= H (y

) + H (y

, X

) + H (Y

n−2

, y

, X

)

=···

≡ H (y

) + H (y

, X

) +···+H(y

, y

,...,y

n−1

, X

(L9)

The last equality in Eq. (L8) stems from the fact that, assuming a memoryless chan-

nel, all received bits y

, y

,...,y

i−1

are uncorrelated with the received bit y

, hence

H(y

, y

,...,y

i−1

, X

) = H (y

). Furthermore, the knowledge of y

is not con-

ditioned to that of the input message bits x

, x

,...,x

except for the bit x

of the same

rank i in the codeword, hence H(y

, y

,...,y

i−1

, X

) = H (y

). Substituting the

result in Eq. (L8) into Eq. (L7) yields:

H(X

; Y

) = H (Y

) −



i=1

H(y

). (L10)

Next, we look for an upper bound in the right-hand side in Eq. (L8). We observe that

H(Y

) ≤



i=1

H(y

), the equality standing if the distribution p(y

) was uniform, in

which case we would have H(y

) ≡ H (Y ) and H(Y

) = nH(Y ). Applying the inequality

to Eq. (L10) we ﬁnally obtain

H(X

; Y

) ≤



i=1

H(y

) −



i=1

H(y

)



i=1

[

H(y

) − H (y

)

]

≡



i=1

H(x

; y

(L11)

This result shows that the mutual information between the two transmitted or received

codeword sources is less than or equal to the sum of mutual information between the

transmitted or received bits in the binary channel. Since, by deﬁnition, H(x

; y

) ≤

624 Appendix L

max H(x

; y

) = C, where C is the binary-channel capacity, we also have

H(X

; Y

) ≤ nC. (L12)

This result establishes that the mutual information between extended sources X

, Y

where bits are passed through the channel n times, is no greater than n times the binary-

channel capacity.

The two properties A and B can now be used (ﬁnally!) to establish the converse proof

of the CCT. As before, we assume that the 2

input message codewords are chosen at

random with a uniform distribution, hence H (X

) = nR. By deﬁnition of the mutual

information, H (X

; Y

), and introducing the majoring properties A and B, we obtain

H(X

; Y

) = H (X

) − H (X

)

↔

nR = H (X

) = H (X

; Y

) + H (X

)

nR ≤ nC +1 + p

↔

R ≤ p

R +

+ C.

(L13)

Since the starting assumption is that the error probability of the code, p

, vanishes for

n →∞, the above result asymptotically becomes:

R ≤ C, (L14)

which represents a necessary condition and, hence, proves the converse of the CCT.

Appendix M (Chapter 16) Bloch sphere

representation of the qubit

In this appendix, I show that qubits can be represented by a unique point on the surface

of a sphere, referred to as a Bloch sphere.

As seen from the main text, any qubit can be represented as the vector linear super-

position

|q=α|0+β|1, (M1)

where |0, |1 form an orthonormal basis in the 2D vector space, and α, β are complex

numbers, which represent the qubit coordinates in this space. The length of the qubit

vector |q is, therefore, given by |α|

+|β|

. Since any two complex numbers α, β

can be deﬁned in the exponential representation as α =|a|e

and β =|β|

, one can

write, from Eq. (M1):

|q=α|0+β|1=|a|

|0+|β|

|1 (M2)

Assume next that the qubit vector is unitary, i.e., |α|

+|β|

= 1. Substituting this

property and with the deﬁnition tan(θ/2) =|β|/|α|, we obtain from Eq. (M2):

|q=

|α|



|α|

+|β|

|0+

|β|



|α|

+|β|

|1



1 + tan

(θ/2)

|0+

tan

(θ/2)



1 + tan

(θ/2)

|1

≡ cos

|0+sin

|1,

(M3)

with 0 ≤ θ ≤ π/2.

Finally, introducing γ = θ

and ϕ = θ

− θ

, the qubit takes the form

|q=



cos

|0+sin

|1



. (M4)

Overlooking the argument γ , which only represents an arbitrary (said “unobservable”)

phase shift, we ﬁnally obtain

|q=cos

|0+sin

iϕ

|1. (M5)

626 Appendix M

Figure M1 Qubit represented as point of coordinates (θ,ϕ) on Bloch sphere.

The qubit can, thus, be represented through two angular coordinates θ,ϕ, which deﬁne

the unique position of a point on a sphere of unit radius, which is the Bloch sphere

illustrated in Fig. M1.

It is seen from Fig. M1 that the pure qubits |0 or |1 correspond to θ = 0orθ = π,

respectively, which occupy the north and south poles of the Bloch sphere. The qubit

representation of quantum information, thus, corresponds to an inﬁnity of states located

on the surface of the Bloch sphere. See more on this topic in Appendix N.