To describe how the system behaves, let t = 1,2,...denote discrete times
at which state-transitions occur, let p(
t
x
i
) denote the probability of state x
i
at
time t, and let
denote the probability distribution of all states of the system at time t. Fur-
thermore, let M = [m
ij
] denote the matrix of conditional probabilities
p(
t+1
x
j
|
t
x
i
) for all pairs ·x
i
, x
j
ÒŒX
2
, which are independent of t. That is,
for all i, j Œ ⺞
3
and all t Œ⺞.
Given the probability distribution
t
p at some time t, the system is capable
of predicting probability distributions at time t + k (k = 1, 2, . . .) or probabil-
ity distributions of sequences of future states of some lengths. The Shannon
entropy of each of these distributions measures the amount of uncertainty in
the respective prediction.We can also measure the amount of information con-
tained in each prediction made by the system (predictive informativeness of
the system). For each prediction type, this is the difference between the
maximum predictive uncertainty allowed by the framework of the system and
the actual predictive uncertainty. The maximum predictive uncertainty is
obtained for the state-transition matrix, M
ˆ
= [mˆ
ij
], in which each row is a
uniform probability distribution. In our case mˆ
ij
= 1/3 for all i, j Œ ⺞.
To illustrate the calculations of predictive uncertainty and predictive infor-
mativeness for the various prediction types, let us assume that the system is
in state x
1
at time t (as indicated in Figure 3.3 by the arrow pointed at x
1
).
This is formally expressed as
t
p =·1, 0, 0Ò. Maximum and actual uncertainties
for some predictions are given in Figure 3.4. The diagram, which contains all
sequences of states with nonzero probabilities of length 4 or less, also shows
probabilities of individual states at each of the considered times. Each of the
arrows under the diagram indicates the time at which the prediction is made
and the time span of the prediction. Each of the first four arrows is a predic-
tion about the next-time probability distributions made at different times.The
next three arrows indicate predictions made at time t about sequences of states
of lengths 2, 3, and 4. The last three arrows indicate predictions made at time
t about probability distributions at time t + 2, t + 3, and t + 4. The two numbers
on top of each arrow indicate the two uncertainties needed for calculating the
informativeness of the prediction, the maximum one and the actual one. Let
us follow in detail the calculation of some of these uncertainties.
Using Figure 3.4 as a guide, the next state prediction made at t + 2 is cal-
culated by the formula