Bhattacharya R., Majumdar M. Random Dynamical Systems: Theory and Applications

Подождите немного. Документ загружается.

6.5 Applications 403

where f = f

with probability q

> 0. To simplify exposition we intro-

duce the following assumption:

[T.6] For all x > 0

(x) < ····< f

(x).

We should stress that the central result on the convergence of (x

∗

)tothe

invariant distribution (in the Kolmogorov metric) can be proved without

the restrictive assumption [T.6] (see Complements and Details).

Let us write H

(x) ≡ i( f

(x)) for k = 1,...,N. Then the composition

map H

satisﬁes the following:

(a) H

is continuous and increasing, i.e., “x



> x” implies “H



) >

(x).”

(b) For all x > 0, H

(x) < ···H

(x) ···< H

(x).

The following lemma is a crucial step in the proof.

Lemma 5.2 There is some D > 0 such that H

(x) > x for all x ∈

(0, D).

Proof. It is convenient to split the long proof into several steps.

Step 1. H

(x) cannot have a sequence of positive ﬁxed points (x

)

such that x

→ 0 as n →∞.

Proof of Step 1. Suppose, to the contrary, that there is a sequence x

such that

) = x

> 0,

and

lim

n→∞

= 0.

Now, from the stochastic Ramsey–Euler condition:



(c( f

))) = δ



k=1





(c( f

)))) f



))



= δ



k=1





[c( f

))] f



)



≥ δu





c( f

))][ f



)



404 Discounted Dynamic Programming Under Uncertainty

Hence,

 δ f



Using [T.5] we get a contradiction as n tends to inﬁnity, since the

right-hand side goes to inﬁnity if x

goes to 0.

Let D

= inf [x > 0:H

(x) = x]. By Step 1, D

> 0. There

is some

D ∈ (0, D

) such that H

(

D) <

D (otherwise the lemma

holds with D = D

). Now if there were some 0 < x <

D such that

(x) > x, then by the intermediate value theorem, there would be

y ∈ (x,

D) such that H

(y) = y <

D < D

, and this would contra-

dict the deﬁnition of D

. Hence,

(x) < x for all x ∈ (0,

D). (5.33)

We shall see that validity of (5.33) leads to a contradiction.

Step 2. There is some K ∈ (0,

D) such that

c( f

(i(y))) > c(y) for all y ∈ (0, K ). (5.34)

Proof of Step 2. Pick any y ∈ (0,

D). Then from the stochastic

Ramsey–Euler condition:



(c(y)) = δ



k=1





[c( f

(i(y)))] f



(i(y))



≥ δu



[c( f

(i(y)))] f



[i(y)]q

Hence,

δ f



(i(y))

≥



[c( f

(i(y)))]



(c(y))

. (5.35)

Since f



(i(y)) goes to inﬁnity as y goes to 0 (remember i(y) ≤ y),

the left-hand side of the inequity goes to 0 as y ↓ 0. Hence, there is

K ∈ (0,

D) such that for all y ∈ (0, K )



[c( f

(i(y)))]



(c(y))

< 1.

By strict concavity of u, c( f

(i(y))) > c(y) for all y ∈ (0, K ) es-

tablishing (5.34).

6.5 Applications 405

Next, pick y

∈ (0, K ) and deﬁne x

= i(y

), c

= y

− i(y

) and

for n ≥ 1, x

= H

n−1

), y

= f

n−1

), c

= y

− x

. Clearly

> 0.

Step 3. “H

(x) < x for all x ∈ (0,

D)” (i.e., validity of (6.33)) implies

“x

→ 0, y

→ 0, c

→ 0asn →∞.”

Proof of Step 3. It is enough to show that x

→ 0. x

< y

, and y

∈

(0,

D) implies x

∈ (0,

D). If x

n−1

∈ (0,

D) then x

= H

n−1

) <

n−1

,sox

∈ (0,

D). Thus, (x

) is decreasing to some x

∗

(say).

Now the continuity of H

and H

) = x

n+1

imply in the limit that

∗

) = x

∗

. Clearly x

∗

< x

D < D

. Hence “x

∗

> 0” contra-

dicts the deﬁnition D

.Sox

∗

= 0. Convergence of x

to 0 and

continuity of f

are invoked to get y

→ 0, and “c

= y

− x

for

all n” implies that c

→ 0.

Next, set y = y

in (5.34) and we get c

n+1

> c

> ···> c

> 0

for all n ≥ 1. So the sequence (c

) cannot converge to 0, contra-

dicting Step 3. Hence there is some D > 0 such that H

(x) > x for

x ∈ (0, D), and Lemma 5.2 is ﬁnally proved.

By using [T.6] and Lemma 5.2 we can assert that

x ∈ (0, D), H

(x) > x for k = 1, 2,...,N . (5.36)

It is clear (recall [T.3] that for x >β

, H

(x) ≡ i( f

(x)) < f

(x) < x

for k = 1, 2,...,N . Hence, by the intermediate value theorem, for each

k = 1, 2,...,N there is some x

(D ≤ x

≤ β

) such that x

= H

Consider the positive numbers Z

, Z

deﬁned as follows:

= min[x > 0:H

(x) = x].

= max[x > 0:H

(x) = x].

= min[x > 0:H

(x) = x].

= max[x > 0:H

(x) = x]. (5.37)

Lemma 5.3

(a) The points Z

, Z

are well deﬁned.

(b) Z

> 0.

(x) > x for all x ∈ (0, Z

);H

(x) < x for all x > Z

(d) H

(x) > x for all x ∈ (0, Z

);H

(x) < x for all x > Z

(e) Z

≤ Z

406 Discounted Dynamic Programming Under Uncertainty

Proof.

(a) The set {x > 0: H

(x) = x} is nonempty, closed, and bounded for

all k.

(b) Since H

(x) > x for all x ∈ (0, D), Z

≥ D > 0.

(x) > x for x ∈ (0, D), if H

(

x) <

x for some

x < Z

continuity of H

and the intermediate value theorem will assert the ex-

istence of some positive

x < Z

such that H

(

x) =

x, contradicting the

deﬁnition of Z

. The second assertion is established in a similar manner.

(d) In view of (c), this is left as an exercise.

(e) These inequalities follow from (c) and (d) and the fact that

(x) > H

(x) for all x ∈ S.

It will now be shown that Z

< Z

. Indeed, for analyzing the long-

run behavior of (5.32), we can focus on the interval [Z

, Z

Lemma 5.4 Z

< Z

Proof. Let x

, x

be any positive ﬁxed points of H

and H

, respectively.

We show that x

< x

. This means that Z

< Z

. From the stochastic

Ramsey–Euler condition:



(c( f

))) = δ



k=1





(c( f

)))) f



))



= δ



k=1





(c( f

))) f



)



But f

) < f

) for all k, and, since c(·) is increasing, c( f

)) >

c( f

)) for all k. Now strict concavity of u implies that u



(c( f

))) >



(c( f

))) for all k. Hence,

1 <δ



k=1



Similarly, we may show that

1 >δ



k=1



6.5 Applications 407

So,



k=1





k=1



. (5.38)

Strict concavity of f

implies that



k=1



(x)q

is decreasing in x,so

(5.38) implies that x

< x

Observe now that

“x ∈ [Z

, Z

]” implies “H

(x) ∈ [Z

, Z

] for all k.”

Moreover, it is not difﬁcult to show that if the initial x /∈ [Z

, Z

], then

∗

deﬁned by (5.32) will be in [Z

, Z

] in ﬁnite time with probability

1. Hence, we have the following result (recall Theorem 5.1 and Example

6.1 of Chapter 3):

Theorem 5.7 Consider the random dynamical system (5.32) with

, Z

] as the state space, and an initial x

= x ∈ [Z

, Z

]. Then

(a) there is a unique invariant distribution π of the Markov process x

∗

;

and (b) the distribution π

(x) of x

∗

(x) converges to π in the Kolmogorov

metric d

, i.e.

lim

t→∞

(π

(x),π) = 0,

irrespective of the initial x ∈ [Z

, Z

6.5.4 Accumulation of Risky Capital

Consider an investor whose initial wealth is y ≥ 0, and who chooses

an action a ∈ A ≡ [0, 1]. Write x ≡ ay, interpreted as investment,

and c ≡ (1 −a)y, interpreted as consumption. Consumption generates

an immediate utility according to a function u :

→ R: u(c) =

u((1 −a)y).

To describe the law of motion, consider a ﬁxed set  ={ρ

,...,ρ

where each ρ

> 0. Let Q ={q

,...,q

, q

} (q

> 0,



k=1

= 1).

The investor’s wealth in the next period is given by

= ρx,

408 Discounted Dynamic Programming Under Uncertainty

where ρ = ρ

with probability q

. Thus, given the initial y and the cho-

sen a, y

= ρ

(ay) with probability q

. The investor “observes” y

and

chooses an action a

from A, and an element ρ from  is picked accord-

ing to the same distribution Q, independent of the initial choice, and the

story is repeated.

We assume

[A.1] u is increasing, continuous, and bounded.

Taking S =

, and applying Theorem 4.1 we get

Theorem 5.8 There exists a stationary optimal policy ζ

∗

= (ˆη

(∞)

where ˆη: R

→ [0, 1] is a Borel measurable function. The value function

V = I (ζ

∗

)[= sup

I (ζ)] is continuous on R

and satisﬁes

V (y) = max

a∈A



u((1 −a)y) +δ



k=1



V (ρ

ay)





Moreover, V is increasing.

For a different perspective see Roy (1995).

Example 5.1 An important but difﬁcult problem is to compute or at least

approximate an optimal (nearly optimal in some sense) policy function

with numerical speciﬁcations of the reward function and the law of mo-

tion. For the discounted optimal growth problem of Subsection 6.5.1,

consider

u(c) = log c

and y

t+1

= Ax

t+1

, t ≥ 0; where A > 0, and (r

) is an i.i.d. sequence with

0 < R

< r

< R

< 1 (where R

, R

are given constants). In this case,

one can show that

c(y) = [1 −δ(Er)]y

is the optimal consumption policy function.

For the accumulation problem of Subsection 6.5.4, consider

u(c) = (1 − a)

−1

1−a

,0< a < 1

6.6 Complements and Details 409

and (ρ

) to be an i.i.d sequence of random variables (assuming the values

= ρ

> 0 with probability q

> 0).

If we assume that

0 <δ[E (ρ

1−a

)] < 1

then c(y) = λy is the optimal consumption policy function where

the positive constant λ is determined from the relation (1 −λ)

δEρ

1−a

. We omit the algebraic details (readily available from many

sources).

6.6 Complements and Details

Section 6.2. The functional equation (2.2) has been ingeniously used to

identify qualitative properties of the value and optimal policy functions

in many contexts. The texts by Ross (1983) and Bertsekas (1995a,b)

contain a number of insightful applications.

The texts by Sargent (1987) and Ljungqvist and Sargent (2000) also

provide examples from dynamic macroeconomics, comprehensive lists

of references, as well as rich collections of exercises.

Section 6.3. The maximum theorem (see Berge 1963) is a key tool in

proving the existence of a Walrasian equilibrium (see Debreu 1959) as

well as the Cournot–Nash equilibrium (see Nash 1950 and Debreu 1952).

Section 6.4. We shall indicate some extensions of the basic model.

6.6.1 Upper Semicontinuous Model

A real-valued function f on a metric space (S, d)isupper semicontinu-

ous at x ∈ S if for every ε>0 there exists a δ>0 such that d(x, y) <δ

implies that f (y) < f (x) + ε. One can show that f is upper semicontin-

uous on S if and only if for each real θ , {x: f (x) <θ} is open. One can

show that f is continuous on S if and only if both f and − f are upper

semicontinuous. Of particular importance is the following:

Proposition C6.1 Let f be an upper semicontinuous real-valued func-

tion on a (nonempty) compact metric space (S, d). Then f is bounded

above and assumes its maximum.

410 Discounted Dynamic Programming Under Uncertainty

For a useful summary of some of the properties of upper semicontin-

uous functions, see Royden (1968, pp. 195–196).

Theorem 4.1 was proved by Maitra (1968) under the assumptions

[A.1], [A.2], [A.4] and [A.3



]. The reward function u is a bounded, upper

semicontinuous function on S × A.

6.6.2 The Controlled Semi-Markov Model

The dynamic programming problem we consider is speciﬁed by (1) a

state space S (a Borel subset of a complete separable metric space), (2) an

action space A (a compact metric space), (3) a reward rate r (x, a), which

accrues in state x when action a is taken, (4) a distribution γ (du | x, a)of

the holding time in state x when an action a is taken, and (5) a transition

probability q(dz | x, a) of the new state z which occurs at the end of this

holding period in state x.

Informally, a policy is a rule that determines which action to take based

on past actions and past and present states. At the kth stage, the policy

chooses an action

= f

,...,X

k−1

, X

)

based on past actions

,...,

k−1

and past and present states

,...,X

. The evolution of the state with time, for a given policy

f ={f

, f

,..., f

,...}, may be described as follows. An initial state

= x

is given, and an action

= f

) is chosen; the state re-

mains at X

for a random time T

whose distribution is γ (du |X

given X

and

. After time T

a new state X

occurs with distribu-

tion q(dz | X

), and, observing X

, the action

= f

, X

)

is chosen; the state remains at X

for a period T

with distribution

γ (du | X

), at the end of which (i.e., at time T

+ T

) a new state

occurs having distribution q(dz | X

), conditionally given X

This continues indeﬁnitely.

We now give a more formal description.

A policy is a sequence of functions ζ ={f

, f

,...}: f

is a Borel

measurable map on S into A, f

is a Borel measurable map on S × A × S

into A,..., f

is a Borel measurable map on S × A × S × A ×···×

S × A × S = (S × A)

× S into A. A policy ζ is Markovian if, for each

k, f

depends only on the last coordinate among its arguments, i.e., each

is a Borel measurable map on S into A . A policy ζ is stationary if

6.6 Complements and Details 411

it is Markovian and f

= f ∀k, where f is some Borel measurable map

on S into A; such a policy is denoted by ( f

(∞)

Corresponding to each policy ζ and an initial state X

= x

∈ S, there

exist on a suitable probability space three sequences of random vari-

ables X

, T

(k = 0, 1, 2,...) whose (joint) distribution is speciﬁed as

follows:

[P.1]

is determined by X

, X

,...,X

= f

;

, X

),...,

= f

, X

,...,X

k−1

, X

),....

[P.2] X

= x

; conditionally given X

, X

,...,X

, the distribution of

k+1

is q(dz | X

[P.3] Conditionally given all X

’s ( k = 0, 1, 2,...), the random vari-

ables T

, T

,... are an independent sequence, with T

having

distribution γ (du | X

Now deﬁne

if T

+···+T

k−1

≤ t < T

+···+T

(k = 1, 2,...),

if 0 ≤ t < T

if T

+···+T

k−1

≤ t < T

+···+T

(k = 1, 2,...),

if 0 ≤ t < T

. (C6.1)

Given a policy ζ and an initial state x, the total expected discounted

rewar d (with discount rate β>0) is given by

V (ζ)(x) = E

∞

−βt

r(Y

, a

) dt, (C6.2)

where E

denotes expectation under ζ and x. Write

(x, a) =

−βu

γ (du | x, a),τ

(x, a) =

1 − δ

(x, a)

(C6.3)

412 Discounted Dynamic Programming Under Uncertainty

As we indicate in Remark C6.4.1 following the statements [A.1]–[A.4]

below, the right-hand side of (C6.2) is well deﬁned and one has

V (ζ)(x)









−βt



r(X

) +

∞



k=1







+···+T

k−1

−βt







r(X

)









1−e

−βT

r(X

∞



k=1

−β(T

+···+T

k−1

)



1 − e

−βT



r(X

)



=τ

(x, f

(x))r(x, f

(x))

+ E

∞



k=1

(x, f

(x))δ

) ···δ

k−1

)τ

)r(X

(C6.4)

For convenience we take the extreme right-hand side of (C6.4) as the

deﬁnition of V (ζ)(x), when it is absolutely convergent. The optimal

discounted reward is

V (x) = sup

V (ζ)(x)(x ∈ S), (C6.5)

where the supremum is over all policies ζ. A policy ζ

∗

is said to be

optimal for discounted reward if

V (ζ

∗

)(x) = V (x) ∀x ∈ S. (C6.6)

The model speciﬁed by [P.1]–[P.3] is often called a semi-Markov or a

Markov renewal model, especially under stationary policies.

A function F on a metric space (S, d) into P(S), is weakly continuous

if the sequence of probability measures F(y

) converge weakly to F(y)

when y

→ y in (S, d). It is strongly continuous if y

→ y implies

F(y

) − F(y)≡sup {|F(y

)(B) − F(y)(B):B ∈ S}→0.

The following assumptions are made:

[A.1] S is a nonempty Borel subset of a complete separable metric

space; A is a compact metric space.

[A.2] (x,

a) → δ

(x,

a) is continuous on S × A. Also,

θ ≡ sup

x,a

(x, a) < 1. (C6.7)