NOTES
3.1. As is well described by Hacking [1975], the concept of numerical probability
emerged in the mid-17th century. However, its adequate formalization was
achieved only in the 20th century by Kolmogorov [1950]. This formalization is
based on the classical measure theory [Halmos, 1950]. The literature dealing with
probability theory and its applications is copious. Perhaps the most comprehen-
sive study of foundations of probability was made by Fine [1973].Among the enor-
mous number of other books published on the subject, it makes sense to mention
just a few that seem to be significant in various respects: Billingsley [1986], De
Finetti [1974, 1975], Feller [1950, 1966], Gnedenko [1962], Jaynes [2003], Jeffreys
[1939], Reichenbach [1949], Rényi [1970a, b], Savage [1972].
3.2. A justified way of measuring uncertainty and uncertainty-based information in
probability theory was established in a series of papers by Shannon [1948]. These
papers, which are also reprinted in the small book by Shannon and Weaver [1949],
opened a way for developing the classical probability-based information theory.
Among the many books providing general coverage of the theory, particularly
notable are classical books by Ash [1965], Billingsley [1965], Csiszár and Körner
[1981], Feinstein [1958], Goldman [1953], Guiasu [1977], Jelinek [1968], Jones
[1979], Khinchin [1957], Kullback [1959], Martin and England [1981], Reza [1961],
and Yaglom and Yaglom [1983], as well as more recent books by Blahut [1987],
Cover and Thomas [1991], Gray [1990], Ihara [1993], Kåhre [2002], Mansuripur
[1987], and Yeung [2002]. The role of information theory in science is well
described in books by Brillouin [1956, 1964] and Watanabe [1969]. Other books
focus on more specific areas, such as economics [Batten, 1983; Georgescu-Roegen,
1971; Theil, 1967], engineering [Bell, 1953; Reza, 1961], chemistry [Eckschlager,
1979], biology [Gatlin, 1972], psychology [Attneave, 1959; Garner, 1962; Quastler,
1955; Weltner, 1973], geography [Webber, 1979], and other areas [Hyvärinen,
1968; Kogan, 1988; Moles, 1966; Yu, 1976]. Useful resources to major papers on
classical information theory that were published in the 20th century are the
books edited by Slepian [1974] and Verdú and McLaughlin [2000]. Claude
Shannon’s contributions to classical information theory are well documented in
[Sloane and Wymer, 1993]. Most current contributions to classical information
theory are published in the IEEE Transactions on Information Theory. Some
additional books on classical information theory, not listed here, are included in
Bibliography.
3.3. Various subsets of the axioms for a probabilistic measure of uncertainty that are
presented in Section 3.2.2. were shown to be sufficient for providing the unique-
ness of Shannon entropy by Feinstein [1958], Forte [1975], Khinchin [1957], Rényi
[1970b], and others. The uniqueness proof presented as Theorem 3.1 is adopted
from a book by Ash [1965]. Excellent overviews of the various axiomatic treat-
ments of Shannon entropy can be found in books by Aczél and Daróczy [1975],
Ebanks et al. [1997], and Mathai and Rathie [1975]. All these books are based
heavily on the use of functional equations.An excellent and comprehensive mono-
graph on functional equations was prepared by Aczél [1966].
3.4. Several classes of functionals that subsume the Shannon entropy as a special case
have been proposed and studied. They include:
NOTES 95