
13.3 Shannon’s channel coding theorem 261
above results into the definition of mutual information H (X; Y ) to obtain:
H(X ; Y ) = H (X) − H (Y |X) ↔
H(X ) = H (X; Y ) + H (Y |X) ↔ (13.39)
R = H(X; Y ) + 0 ≤ max H (X; Y ) = C,
which indeed proves the necessary condition R ≤ C. However, this demonstration does
not constitute the actual converse proof of the CCT, since the theorem concerns codes that
are only asymptotically zero-error codes, as the block length n is increased indefinitely.
The second, and formally complete way of demonstrating the converse proof of the
CCT is trickier, as it requires one to establish and use two more properties. The whole
point may sound, therefore, overly academic, and it may be skipped, taking for granted
that the converse proof exists. But the more demanding or curious reader might want
to go the extra mile. To this intent, the demonstration of this converse proof is found in
Appendix L.
We shall next look at the practical interpretation of the CCT in the case of noisy,
symmetric binary channels. The corresponding channel capacity as given by Eq. (13.5),
is C = 1 − f (ε) = 1 +ε log ε + (1 −ε)log(1− ε), where ε is the noise parameter or,
equivalently, the bit error rate (BER). If we discard the cases of the ideal or noiseless
channel (ε = f (ε) = 0) and the useless channel (ε = 0.5, f (ε) = 1), we have 0 <
| f (ε)| < 1 and, hence, the capacity C < 1, or strictly less than one bit per channel
use. According to the CCT, the code rate for error-free transmission must satisfy R <
C, here R < C < 1. Thus block codes (n, n) of any length n,forwhichR = n/n =
1 are not eligible. Block codes (n, k) of any length n,forwhichR = k/n > C are
not eligible either. The eligible block codes must satisfy R = k/n < C. To provide
an example, assume for instance ε = 0.01, corresponding to BER = 1 ×10
−3
.We
obtain C = 0.9985, hence the CCT condition is R = k/n < 0.9985 ... Consider now
the following possibilities:
First, according to Chapter 11, the smallest linear block code, (7, 4), has a length
n = 7, and a rate R = 4/7 = 0.57. We note that although this code is consistent with
the condition R < C, it is a poor choice, since we must use n − k = 3 parity bits
for k = 4 payload bits, which represents a heavy price for a bit-safe communication.
But let us consider this example for the sake of illustration. As a Hamming code of
distance d
min
= 3, we have learnt that it has the capability of correcting no more than
(d
min
− 1)/2 = 1 bit errors out of a seven-bit block. If a single error occurs in a block
sequence of 1001 bits (143 blocks), corresponding to BER ≈ 1 × 10
−3
, this error will
be absolutely corrected, with a resulting BER of 0. If, in a block sequence of 2002 bits,
exactly two errors occur within the same block, then only one error will be corrected,
and the corrected BER will be reduced to BER ≈ 0.5 × 10
−3
= 5 × 10
−4
. Since the
condition BER ≈ 0 is not achieved when there is more than one error, this code is,
therefore, not optimal in view of the CCT, despite the fact that it satisfies the condition
R < C. Note that the actual corrected error rate (which takes into account all error-event
possibilities) is BER < 4 ×10
−6
, which is significantly smaller yet not identical to zero,
and is left as an exercise to show.