6.1 Entropy of continuous sources 87
The various relations and properties that were obtained in the discrete case between
entropy, conditional entropy, joint entropy, relative entropy, and mutual information
also apply to the continuous case. To recall, for convenience, these relations and
properties are:
H(X, Y ) = H (Y |X) + H (X)
H(X, Y ) = H (X|Y ) + H (Y ),
(6.9)
H(X ; Y ) = H (X) − H (X |Y )
= H (Y ) − H (Y |X) (6.10)
= H (X) + H(Y ) − H(X, Y ),
D(X, Y ) = H(X, Y ) − H(X ; Y )
= H (X|Y ) + H (Y |X), (6.11)
D( pq) ≥ 0, (6.12)
H(X ; Y ) = D[p(x, y)p(x) p(y)] ≥ 0, (6.13)
D[ p(x, y)q(x, y)] = D[p(x)q(x)] + D[ p(y|x)q(y|x)]. (6.14)
In particular, it follows from Eqs. (6.13) and (6.10) that H(X |Y ) ≤ H (X ) and
H(Y |X ) ≤ H (Y ), with equality if the sources are independent. Thus for continuous
sources, conditioning reduces differential entropy, just as in the discrete case.
I shall describe next a few examples of PDFs lending themselves to closed-form, or
analytical definitions of differential entropy, with some illustrations regarding relative
entropy and KL distance.
Consider first the continuous uniform distribution, defined over the real interval of
width u = b − a,Eq.(6.8). As we have seen, the corresponding bit/symbol entropy
is H
uniform
= log
2
(b −a). We note that the entropy H
uniform
is nonpositive if u ≤ 1.
The result shows that the entropy of a continuous uniform distribution of width u
increases as the logarithm of u. In the particular cases where u = 2
N
, with N being
integer, then H
uniform
= N bit/symbol. In the limit u, N →∞,orp(x) = 1/u = 2
−N
→
0, corresponding to a uniform distribution of infinite width, the entropy is infinite,
corresponding to an infinite number of degrees of freedom for source events having
themselves an infinite information, I (x) =−log[ p(x)] = N bits. We thus observe that,
short of any constraints on the definition interval, or PDF mean, the entropy is unbounded,
or H
uniform
→+∞as N →∞.
We may compute the relative entropy, or KL distance, between any continuous PDF
p(x) defined over the domain X = [a, b] with u = (b − a) = 2
N
and the corresponding
uniform PDF, which we now call q(x) = 1/u = 2
−N
, according to Eq. (6.5):
D[ p(x)q(x)] =
X
p(x)log
p(x)
q(x)
dx
=
X
p(x)log
2
N
p(x)
dx
=
X
p(x)
log(2
N
) + log p(x)
dx (6.15)