It matters little which of these forms is adopted since they can be derived
from one another. (The branching axiom is illustrated later in this chapter by
Example 3.6 and Figure 3.6.)
Axiom (S9) Normalization. To ensure (if desirable) that the measurement
units of S are bits, it is essential that
This axiom must be appropriately modified when other measurement units are
preferred.
The listed axioms for a probabilistic measure of uncertainty and informa-
tion are extensively discussed in the abundant literature on classical informa-
tion theory. The following subsets of these axioms are the best known
examples of axiomatic characterization of the probabilistic measure of
uncertainty:
1. Continuity, weak additivity, monotonicity, branching, and normalization.
2. Expansibility, continuity, maximum, branching, and normalization.
3. Symmetry, continuity, branching, and normalization.
4. Expansibility, symmetry, continuity, subadditivity, additivity, and
normalization.
Any of these collections of axioms (as well as some additional collections)
is sufficient to characterize the Shannon entropy uniquely. That is, it has been
proven that the Shannon entropy is the only functional that satisfies any of
these sets of axioms. To illustrate in detail this important issue of uniqueness,
which gives the Shannon entropy its great significance, the uniqueness proof
is presented here for the first of the listed sets of axioms.
Theorem 3.1. The only functional that satisfies the axioms of continuity, weak
additivity, monotonicity, branching, and normalization is the Shannon entropy.
Proof. (i) First, we prove the proposition f(n
k
) = kf(n) for all positive integers
n and k by induction on k, where
is the function that is used in the definition of weak additivity. For k = 1, the
proposition is trivially true. By the axiom of weak additivity, we have