380 Discounted Dynamic Programming Under Uncertainty
inputs) can be viewed as a random dynamical system. We draw upon
our earlier results to analyze the long-run behavior of the optimal input
process.
6.2 The Model
A dynamic programming problem is specified by the following objects:
" S, A, q, u,δ#, where S is a nonempty Borel subset of a Polish (i.e.,
complete, separable metric) space, interpreted as the set of states of some
system; A is a nonempty Borel subset of a Polish space, interpreted as
the set of actions available to the decision maker; q is the law of motion
of the system – it associates (Borel measurably) with each pair (s, a)a
probability measure q(.|s, a)onS, the Borel sigmafield of S: when the
system is in state s and an action a is chosen, it moves to the state s
in the next period according to the distribution q(.|s, a); u is a bounded
Borel measurable function on S × A, interpreted as the utility, income,
or immediate return – when the system is in state s, and the chosen action
is a, the decision maker receives an income u(s, a); δ is a discount factor,
0 <δ<1. A policy (or plan) ζ = (ζ
t
) specifies for each t 1 which
action to choose in the tth period as a Borel measurable function of the
history h = (s
1
, a
1
,...,a
t−1
; s
t
) of the system up to period t, or more
generally, ζ specifies for each h a probability distribution ζ
t
(.|h)onthe
Borel subsets of A.
A Borel function f from S into A defines a policy. When in state s,
choose an action f (s)(independently of when and how the system has
arrived at state s). We denote the corresponding policy by ( f
(∞)
). Such
policies are called stationary, and f is somewhat informally referred to
as an optimal policy function.
A policy ζ associates with each initial state s a corresponding
tth period expected return u
t
(ζ)(s) and an expected discounted total
return
I (ζ)(s) =
∞
t=1
δ
t−1
u
t
(ζ)(s), (2.1)
where δ is the discount factor, 0 <δ<1.
A policy ζ
∗
will be called optimal if I (ζ
∗
)(s) I (ζ)(s) for all policies
ζ and s ∈ S. The problem, then, is to find an optimal policy.