122 Algorithmic entropy and Kolmogorov complexity
which shows that if two strings are algorithmically independent their joint complexity
K (x, y) is given by the sum of their individual complexities. It is clear that K (x, y) =
K (y, x). What if x and y are not algorithmically independent? This means that the
computation of x provides some clue as to the computation of y. A program calculating
y could be q
y
= q
x
q
y
|
x
. The machine U first computes x then uses the program q
y
|
x
to
compute y.
Next, we shall define the conditional complexity K (y
|
q
x
) , which represents the
minimal size of a program describing y given the program describing x. It is also noted
K (y
|
x∗), with x∗=q
x
, or, for simplicity, K (y
|
x ). This last notation should be used
with the awareness that
|
x is a condition on the program q
x
, not on the string x.
The issue of finding the minimal size of q
y
= q
x
q
y
|
x
is far from trivial. Chaitin
showed
22
that
K (x, y) ≤ K (x) + K (y
|
x ) +c
↔ (7.26)
K (y
|
x ) = K (x, y) − K (x) +c
,
where c represents a small overhead constant, which is one bit for sufficiently long
strings. The second inequality stems from the first, with c
≥ 0 being a nonnegative
constant.
23
Since the joint complexity K (x, y) is symmetrical in the arguments x, y,we
also have
K (x
|
y ) = K (x, y) − K (y) +c
. (7.27)
If x and y are algorithmically independent, it is clear that q
y|x
= q
y
(there is no clue
from x to compute y), or equivalently K (y
|
x ) = K (y), and likewise, K (x
|
y ) = K (x).
In this case, K (x, y) = K (x) + K (y) +c
.
We can now define the mutual complexity K (x; y)ofx and y (note the delimiter “;”)
according to either of the following:
K (x; y) = K (x) + K (y) − K (x, y)
K (x; y) = K (x) − K (x
|
y ) +c
K (x; y) = K (y) − K (y
|
x ) +c
,
(7.28)
where c
is a nonnegative constant. In Eq. (7.28), the last two definitions stem from the
first one and the properties in Eqs. (7.26) and (7.27).
The above results represent various relations of algorithmic complexity between two
strings x, y. We immediately note that such relations bear a striking resemblance with
that concerning the joint or conditional entropies and mutual information of two random-
event sources X, Y according to classical IT.
22
See: G. J. Chaitin, A theory of program size formally identical to information theory. JACM, 22 (1975), 329–
40, www.cs.auckland.ac.nz/CDMTCS/chaitin/acm75.pdf. See also: G. J. Chaitin, Algorithmic information
theory. IBM J. Res. Dev., 21 (1977), 350–9, 496, www.cs.auckland.ac.nz/CDMTCS/chaitin/ibm.pdf.
23
From the first definition in Eq. (7.26) we obtain K (y
|
x ) ≥ K (x) − K (x, y) −c, therefore, there exists a
constant c
≥ 0 for which K (y
|
x ) = K (x) − K (x, y) +c
.