Shannon’s entropy 571
There exist two possible cases:
r
u is an integer: then E(u) = u = n log T/ log S, and the first inequality in Eq. (B12)
intrinsically holds regardless of the size of n. The second inequality converts to
A(T )
A(S)
−
log T
log S
<ε, (B15)
which, in the limit of large n, leads to the conclusion expressed in Eqs. (B12) and
(B13);
r
u is not integer: then E(u) = u + x = n log T / log S + x, where 0 < x < 1. In this
case, the first inequality intrinsically holds regardless of the size of n, and the second
inequality converts to
A(T )
A(S)
−
log T
log S
<ε(1 + x) < 2ε (B16)
which leads to the same conclusion.
2
Step 3
Assume that the source now contains N equiprobable possibilities. Its information mea-
sure is, therefore, H(1/N, 1/N,...,1/N ) = A(N ) = K log N . We can arbitrarily group
these possibilities into m subgroups having n
i
elements each, and whose associated prob-
abilities are p
i
= n
i
/N . According to condition (3), if we define a first partition of two
subgroups of length n
1
and n
2
= N − n
1
, with probabilities p
1
= n
1
/N and p
2
= n
2
/N ,
respectively, we have
A(N ) = H
(
p
1
, p
2
)
+ p
1
A(n
1
) + p
2
A(n
2
). (B17)
In the right-hand development of Eq. (B17), the first term corresponds to the information
provided by this partition and the second two terms corresponds to the information
contained in the two subgroups (weighted by their respective occurrence probabilities
p
1
, p
2
). Note that if any subgroup has only one element (e.g., n
1
= 1), the corresponding
information vanishes, since A(1) = 0.
We can then continue this partitioning by splitting the second subgroup into two new
subgroups with n
2
and n
3
elements (n
2
+ n
3
= N − n
1
) and respective probabilities
p
2
= n
2
/N , p
3
= n
3
/N , yielding the information decomposition:
A(N ) = H
(
p
1
, p
2
, p
3
)
+ p
1
A(n
1
) + p
2
A(n
2
) + p
3
A(n
3
). (B18)
2
It is interesting to note that the Appendix in C. E. Shannon, A mathematical theory of communication.
Bell Syst. Tech. J., 27 (1948), 79–423, 623–56 proceeds differently: Eqs. (B8) and (B11) are written in
the alternative forms |log T / log S − m/n| <εand | A(T )/A(S) − m/n| <ε, respectively. From there, it is
directly concluded that | A(T )/A(S) − log T / log S| < 2ε, which is far from o bvious (as the reader might
easily check). As a matter of fact, this last inequality summarizes at once the results in Eqs. (B15) and (B16)
without providing any of the details. This apparent omission could be explained by the author’s concer n to
save room in a very mathematically intensive paper or, as another possibility, to challenge the reader with
the proof.