CHAPTER 7
Bellman’s Principle of Optimality and
Its Generalizations
In dynamic programming Bellman’s principle of optimality is extremely important.
The principle says (Bellman, 1957) that each subpolicy of an optimum policy must
itself be an optimum policy with regard to the initial and the terminal states of the
subpolicy. Here, the terms “policy,” “subpolicy,” “optimum policy,” and “state”
are are primitive and are not given specific meanings. In each application, these
terms are understood in certain ways comparable to the situation of interest. For
example, in a weighted graph, an optimum path from a vertex v
1
to vertex v
2
is
the one with the smallest sum of weights of the edges along the path. In this case,
a “policy” means a path, a “subpolicy” a subpath, an “optimum policy” stands
for a path with the smallest weight, and “the initial and terminal states” for the
beginning and ending vertices v
1
and v
2
,
respectively. The principle implies that
for each chosen optimum path from v
1
to v
2
,
each subpath must be an optimum
path connecting the initial and terminal vertices of the subpath.
7.1. A Brief Historical Note
During the last half century or so, Bellman’s principle has become the cor-
nerstone in the study of the shortest-path problem, inventory problem, traveling-
salesman problem, equipment-replacement models, the resource allocation prob-
lems, etc. For details see Dreyfus and Law (1977). However, since the early
1970s, a number of practical optimization problems have been put forward in
management science, economics, and many other fields. One feature common
to all these problems is that the performance criteria are not real numbers, and
consequently, the optimum options are no longer maxima or minima of some
set of real numbers. Many dynamic programming models with nonscalar-valued
performance criteria have been established. See (Bellman and Zadeh, 1970; Mit-
ten, 1974; Wu, 1980; Wu, 1981a; Furukawa, 1980; Henig, 1983; Baldwin and
Pilsworth, 1982).
135