5.3 Relative entropy 79
which is easy to memorize if a condition |z is applied to both sides of the definition of
joint entropy, H (X, Y ) = H (X ) + H (Y |X ). The second chain rule,
H(X, Y |Z ) = H (Y |Z) + H(X |Y, Z ), (5.23)
comes from the permutation in Eq. (5.22) of the sources X, Y , since the joint entropy
H(X, Y ) is symmetrical with respect to the arguments.
The lesson learnt from using Venn diagrams is that there is, in fact, little to
memorize, as long as we are allowed to make drawings! The only general rule to
remember is:
H(U|Z ) is equal to the entropy H(U) defined by the source U (for instance, U = X, Y or
U = X; Y ) minus the entropy H(Z ) defined by the source Z , the reverse being true for H(Z |U).
But the use of Venn diagrams require us not to forget the unique correspondence between the
ensemble or Boolean operators (∪∩¬) and the separators (, ; |) in the entropy-function arguments.
5.3 Relative entropy
In this section, I introduce the notion of distance between two event sources and the
associated concept of relative entropy.
The mathematical concept of distance between two real variables x, y is famil-
iarly known as the quantity d =|x − y|. For two points A, B in the plane, with
coordinates (x
A
, y
A
) and (x
B
, y
B
), respectively, the distance is defined as d =
(x
A
− x
B
)
2
+ (y
A
− y
B
)
2
.
More generally, any definition of distance d(X, Y ) between two entities X, Y must
obey four axiomatic principles:
(a) Positivity, d(X, Y ) ≥ 0;
(b) Symmetry, d(X, Y ) = d(Y, X);
(c) Nullity for self, d(X, X) = 0;
(d) Triangle inequality, d(X, Z) ≤ d(Y, X) + d(Y, Z).
Consider now the quantity D(X, Y ), which we define as
D(X, Y ) = H (X, Y ) − H (X ; Y ). (5.24)
From the visual reference of the Venn diagrams in Fig. 5.2 (top), it is readily verified
that D(X, Y ) satisfies at least the first three above distance axioms (a), (b), and (c). The
last axiom, (d), or the triangle inequality, can also be proven through the Venn diagrams
when considering three ensembles X, Y, Z , which I leave here as an exercise. Therefore,
D(X, Y ) represents a distance between the two sources X, Y .
It is straightforward to visualize from the Venn diagrams in Fig. 5.2 (top) that
D(X, Y ) = H (X|Y ) + H (Y |X), (5.25)