Bhattacharya R., Majumdar M. Random Dynamical Systems: Theory and Applications

Подождите немного. Документ загружается.

5.6 Complements and Details 373

Finally,

lim

n→∞

)

E exp{iξ S

}−exp



−



)

 lim

n→∞

)

E exp{iξ S



}−E exp{iξ S



}

)

lim

n→∞

)

E exp{iξ S



}−exp



−



)

= 0 +

lim

n→∞

)

E exp{iξ S



}−exp



−



)





exp



−



−|ξ |



−1

|ξ|

ε. (C4.14)

The extreme right side of (C4.14) goes to zero as ε ↓ 0, while the extreme

left does not depend on ε.

Condition (ii) of the martingale CLT is called the conditional Linde-

berg condition.

The martingale CLT as presented here is due to Brown (1977). Its

proof follows Bhattacharya and Waymire (1990).

The following CLT is due to Gordin and Lifsic (1978). Its proof is

based on a version of the margingale CLT due to Billingsley (1961)

and Ibragimov (1963). The term “ergodic” in its statement is deﬁned in

a general context under Remark C4.1. For our purposes, it is enough

to know that a Markov process {X

: n ≥ 0} with a unique invariant

probability π is ergodic (if X

has distribution π). The process {X

n ≥ 0} is then said to be ergodic even when X

has an arbitrary initial

distribution.

As a very special case, the following is obtained: If p(x; dy) admits

a unique invariant probability π, and {X

: n  0} is a Markov process

with transition probability p(x, dy) and initial distribution π , then {X

 0} is ergodic.

Theorem C4.1 (CLT for discrete-parameter Markov processes)

Assume p(x, dy) admits an invariant probability π and, under the initial

distribution π , {X

} is ergodic. Assume also that

h = h −

hdπ is in

the range of I − T . Then, under initial π ,

√

n−1



m=0



h(X

) −

hdπ



→ N (0,σ

) as n →∞, (C4.15)

where σ

= σ

is given by (4.1) with g deﬁned by (I − T )g =

374 Invariant Distributions: Estimation and Computation

Remark C4.1 (Birkhoff’s ergodic theorem) Let T

be a measurable map

on a state space (

S) into itself, which preserves a probability measure µ

on (

S). Birkhoff’s ergodic theorem says: For every µ-integrable g,



n−1

m=0

g(T

x) converges a.s.(w.r.t. µ) and in L

(µ) to a function

g(x),

say, where

g is invariant:

g(T

x) =

g(x) ∀x ∈

gdµ =

gdµ. If

the dynamical system (T

S,µ) is ergodic,

g =

gdµ. Here (T

S,µ)

is said to be ergodic if all invariant functions are constant a.s. with respect

to µ. Suppose one takes

S = S

∞

, S = S

⊗∞

, and X

(n ≥ 0) a stochas-

tic process with state space (S, S), deﬁned on the canonical probability

space (S

∞

, S

⊗∞

, Q) such that X

(ω) is the projection onto the nth coor-

dinate of ω ∈ S

∞

(n ≥ 0). The process X = (X

, X

,....) is stationary,

T X ≡ (X

, X

,...) has the same distribution Q as X. Then, by

Birkhoff ’s ergodic theorem, if E|h(X)| < ∞,



n−1

m=0

X) con-

verges a.s. and in L

(Q) to an invariant random variable

h(X), say.

If {X

: n ≥ 0} is ergodic then

h(X) = Eh(X) ≡

hdQ a.s. As a spe-

cial case, with h(X):= f (X

) for some measurable f : S → R such that

E| f (X

)| < ∞, one has



n−1

m=0

f (X

) converges a.s. and in L

(Q)to

an invariant

h(X), which equals Ef(X

)if{X

: n ≥ 0} is ergodic. Note

that the validity of the last italicized statement depends only on the dis-

tribution of X. Hence it holds for a stationary process {X

: n ≥ 0} on an

arbitrary probability space (, F, P).

For proofs of Birkhoff’s ergodic theorem, we refer to Billingsley (1965,

pp. 20–29), or Bhattacharya and Waymire (1990, pp. 224–227).

Theorem C4.2 below is due to Bhattacharya and Lee (1988). Theorem

4.2 is a special case of this.

Theorem C4.2 Let {X

: n ≥ 0} be a Markov process deﬁned by X

n+1

(n ≥ 0), where {α

: n ≥ 1} is a sequence of i.i.d. monotone

nondecreasing maps on a closed subset S of



, and X

is independent

of {α

: n ≥ 1}. Suppose the splitting hypothesis (H) of Section 3.5 holds,

as in the case of collary 5.3. Then, no matter what the distribution of

is, the CLT (C4.15) holds for every h that may be expressed as

the difference f

− f

between two nondecreasing measurable functions

: S → S such that f

∈ L

(π)(i = 1, 2).

Bhattacharya (1982) contains additional information on the CLT for

Markov processes in continuous time, much of which can be adapted to

discrete time.

5.7 Supplementary Exercises 375

Central limit theorems are truly “central” for statistical inference in

time series. Of the great body of literature on the statistical inference

for time series, we mention Anderson (1971), Box and Jenkins (1976),

Brillinger (1981), Brockwell and Davis (1991), Grenander and Rosen-

blatt (1957), Granger and Terasvirta (1993), Hannan (1970), Pristley

(1981), and Rosenblatt (2000). A comprehensive account of the asymp-

totic statistical theory for estimation and testing for parametric and semi-

parametric linear as well as nonlinear models may be found in Tamiguchi

and Kakizawa (2000). Hwang (2002) contains some asymptotic results,

including bootstrapping of the nonparametric estimation of the driving

function f in nonlinear autoregressive models (5.1) of Chapter 4, under

the hypothesis of Theorem C5.1 in Complements and Details, Chapter 4.

For bootstrapping in the ﬁrst-order model NLAR(1), see Frankel, Kreiss,

and Mammen (2002). The results of Hwang on NLAR(k), k ≥ 1, were

obtained independently of those on the ﬁrst-order model NLAR(1) by

Franke et al. (2002).

The Billingsley–Ibragimov CLT (Billingsley 1961, Ibragimov 1963,

applied to Markov processes in Gordin and Lifsic (1978), Bhattacharya

(1982), is especially useful when the process is not necessarily irre-

ducible.

5.7 Supplementary Exercises

(1) (Deﬁnitions and simple properties of martingales). A sequence of

random variables {X

}(n = 0, 1,...) is said to be {

}-adapted with

respect to an increasing sequence of sigmaﬁelds 

⊂

n+1

···(n =

0, 1,...)ifX

is 

-measurable (for every n). An {

}-adapted sequence

} with ﬁnite expectations is said to be {

}-martingale if E(X

n+1



) = X

a.s. (n ≥ 0). If, in particular, 

= σ {X

:0≤ j ≤ n}∀n, then

} is simply said to be a martingale. Suppose that {X

} is an {

adapted martingale. Prove the following statements:

(a) E(X

|

) = X

a.s. for every m ≤ n. [Hint: Take successive

conditional expectations, recalling that E(X | G) = E[E(X |) | G] for

all sigmaﬁelds G ⊂.]

(b) {X

} is a martingale (with respect to the sigmaﬁelds

= σ {X

:0≤ j ≤ n}. [Hint: E(X

n+1

| G

) = E[E(X

n+1

|

)|G

] =

E(X

| G

) = X

a.s., noting that if {X

} is {

}-adapted then G

⊂



∀n.]

376 Invariant Distributions: Estimation and Computation

≡ X

− X

n−1

(n ≥ 1) satisfy (i)

E(Z

n+1

|

) = 0 = E(Z

n+1

| G

)(n ≥ 0), (ii) if EX

< ∞ for all n,

then EZ

= 0 ∀n = m.

(d) EX

= EX

for all n (constancy of expectations).

(e) EX

= EX



m=1

,ifEX

< ∞ for all n, and Z

≡

− X

m−1

(m ≥ 1).

(2) (Examples of martingales).

(a) Suppose {Z

: n ≥ 0} is a sequence of independent ran-

dom variables with ﬁnite means EZ

= µ

(n ≥ 0). Prove that (i)



− µ

), n ≥ 0, is an {

}-martingale, where 

= σ {Z

0 ≤ j ≤ n}, (ii) if EZ

< ∞ for all n, then X

≡ S

−



(n ≥ 0) is

an {

}-martingale, where σ

≡ var(Z

). [Hint: E(X

n+1

|

)=E((S

n+1

)

−



n+1

|

)=S

−



+E((Z

n+1

− µ

n+1

)

|

)−

E(Z

n+1

− µ

n+1

|

) − σ

n+1

= X

(b) Suppose Y

( j ≥ 1) are independent random variables with a com-

mon ﬁnite nonzero mean µ. Show that X

= µ

−n

j=1

(n ≥ 1) is an

{

}-martingale, where 

= σ {Y

:1≤ j ≤ n} (n ≥ 1).

with a ﬁnite expectation. If 

(n ≥ 1) is an increasing sequence of sig-

maﬁelds, 

⊂for all n, show that X

≡ E(X |

), n ≥ 1, is an {

martingale.

(3) (Stopping times). Let {X

}be an {

}-adapted sequence of random

variables (n ≥ 0). A stopping time,oran{

}-stopping time, τ is a ran-

dom variable taking values in {0, 1, 2,...} ∪ {∞}such that {τ ≤ n}∈

for every n = 0, 1, 2,....Prove the following:

(a) τ is a stopping time if and only if {τ = n}∈

for all n.

(b) If τ is a stopping time, then so is τ ∧ m ≡ min{τ,m} for every

nonnegative integer m.

negative integer m.

(d) The ﬁrst passage time τ

= inf{n ≥ 0: X

∈ B} is a stopping

time for every Borel set B. Here “inf” of an empty set is taken to

be ∞.

(4) (Optional stopping of martingales). Suppose {X

} is a {

adapted martingale (n ≥ 0) and τ is a stopping time.

(a) Prove that EX

= EX

if τ is bounded. [Hint: Suppose

τ ≤ m. Deﬁne X

−1

= 0. Then EX



n=0

· 1

{τ =n}



n=0

5.7 Supplementary Exercises 377

{τ ≤n}

− 1

{τ ≤n−1}

.] Now use the fact that E[(X

− X

n−1

)

{τ ≤n−1}

]=E[1

{τ ≤n−1}

E(X

−X

n−1

|

n−1

)]=0∀n ≥ 1, to get EX



n=0

{τ ≤n}

−



n=0

E(X

n−1

+ X

− X

n−1

{τ ≤n−1}



n=0

{τ ≤n}

−



n=0

n−1

{τ ≤n−1}

= EX

{τ ≤m}

= EX

(b) If τ is ﬁnite a.s. and E|X

τ ∧m

− X

|→0asm →∞,

then prove that EX

= EX

. [Hint: Take τ ∧ m as the stopping

time in (a) to get EX

τ ∧m

= EX

, and then take the limit as

m →∞.]

(5) (First passage times for the simple symmetric random walk). Let

= 0, S

= Z

+ Z

+···+Z

(n ≥ 1), where Z

( j ≥ 1) are i.i.d.,

and P(Z

=+1) = P(Z

=−1) = 1/2. Let a, b be two positive inte-

gers and consider the ﬁrst passage time τ = τ

{−a,b}

(to −a or b). Let



= σ {Z

:1≤ j ≤ n}, 

={φ, }. Let A

be the event “{S

}reaches

−a before it reaches b,”and A

be the event “{S

} reaches b before it

reaches −a.”

(a) Prove that P(A

) = b/(a + b), P(A

) = a/(a + b ). [Hint: {S

n ≥ 0} is a {

} martingale, and ES

= 0 = ES

= bP(A

) −

aP( A

) = bP(A

) − a(1 − P(A

)) = (a + b)P(A

) − a. Here we use

Exercise 4(b).]

(b) Prove that Eτ = ab. [Hint: Consider the {

}-martingale X

= 0,

= S

− nEZ

≡ S

− n(n ≥ 1). By Exercise 4(b), ES

− Eτ = 0,

or Eτ = (−a)

P(A

) + b

P(A

) = a

b/(a + b) + b

a/(a + b) = ab.

To apply Exercise 4(b), one needs to show that E|X

τm

− X

|≡

|E(S

τ ∧m

− S

) − (τ ∧ m − τ )|→0. Since S

τ ∧m

, S

are bounded by

max{a

, b

}, and τ ∧ m → τ a.s. as m →∞, it is enough to prove

that Eτ<∞. For then τ ∧ m ↑ τ a.s., and by the monotone conver-

gence theorem, E|τ ∧ m − τ |→0asm →∞. To prove that Eτ<∞,

write Eτ =



∞

m=0

P(τ ≥ m). Now the probability that the random walk

reaches {−a, b } in n

= max{a, b}steps or less is at least





≡ δ,say,

matter where the random walk starts in (−a, b) since the distance

between the starting point and {−a, b } is never more than n

. Hence

P(τ ≥ m) ≤ (1 − δ)

if m ≥ n

k. That is, P(τ ≥ m) ≤ (1 − δ)

[m/n

]

∀m,

where [r] is the integer part of r. This shows that the series for Eτ con-

verges (exponentially fast).]

(6) (Wald’s identity). Let Y

( j ≥ 1) be a sequence of i.i.d. random vari-

ables with ﬁnite mean µ = EY

, Y

≡ 0. Let τ be a {

}-stopping time,

where 

= σ {Y

:0≤ j ≤ n}. Write S



. Suppose Eτ<∞

and E|S

− S

τ ∧m

|→0asm →∞. Then ES

= µEτ . [Hint: Consider

378 Invariant Distributions: Estimation and Computation

the {

}-martingale X

= S

− nµ, and apply Exercise 4(b) to get

= EX

= 0, i.e., ES

− Eτµ = 0.]

(7) (First passage times for the asymmetric simple random walk).

Let {Z

: j ≥ 1} be i.i.d., P(Z

=+1) = p, P(Z

=−1) = q = 1 − p,

and assume p > 1/2. Let τ = τ

{−a,b}

. Prove that, for the random walk



− n( p − q), n ≥ 1, S

= 0,

Eτ =

a + b

p − q







1 −

a+b







−

p + q

[Hint: By Exercise (6), ES

= ( p − q)Eτ . But ES

= (−a)P(A

) +

bP(A

), where A

, A

are as in Exercise (5). Now use P(A

)(= 1 −

P(A

)) as given in Example 6.3, Chapter 2 (i.e., (6.22), with x = 0,

c =−a, d = b).]

Discounted Dynamic Programming

Under Uncertainty

The basic need for a special theory to explain behavior under conditions of

uncertainty arises from two considerations: (1) subjective feelings of imperfect

knowledge when certain types of choices, typically commitments over time, are

made; (2) the existence of certain observed phenomena, of which insurance is

the most conspicuous example, which cannot be explained on the assumption

that individuals act with subjective certainty.

Kenneth J. Arrow

6.1 Introduction

In this chapter we brieﬂy review some results on discounted dynamic

programming under uncertainty, and indicate how Markov processes

and random dynamical systems are generated by optimal policies. In

Section 6.2, following a precise description of the dynamic program-

ming framework, we turn to the special case where the set S of states is

countable, and the set A of actions is ﬁnite. Here the link between optimal

policies and the celebrated functional equation can be established with

no measure theoretic complications. In Section 6.3 we study the maxi-

mum theorem, which is of independent interest in optimization theory

and is a key to the understanding of the basic result in the next section.

In Section 6.4 we explore the more general model where S is a Borel

subset of a Polish space, and A is a compact action space, and spell out

the conditions under which there is a stationary optimal policy.

The dynamic programming technique reviewed here has been par-

ticularly powerful in attacking a variety of problems in intertemporal

economics. In Section 6.5 we discuss in detail the aggregative model

of optimal economic growth under uncertainty. Here, given the spe-

ciﬁc structure of the model, the process of optimal stocks (or optimal

379

380 Discounted Dynamic Programming Under Uncertainty

inputs) can be viewed as a random dynamical system. We draw upon

our earlier results to analyze the long-run behavior of the optimal input

process.

6.2 The Model

A dynamic programming problem is speciﬁed by the following objects:

" S, A, q, u,δ#, where S is a nonempty Borel subset of a Polish (i.e.,

complete, separable metric) space, interpreted as the set of states of some

system; A is a nonempty Borel subset of a Polish space, interpreted as

the set of actions available to the decision maker; q is the law of motion

of the system – it associates (Borel measurably) with each pair (s, a)a

probability measure q(.|s, a)onS, the Borel sigmaﬁeld of S: when the

system is in state s and an action a is chosen, it moves to the state s



in the next period according to the distribution q(.|s, a); u is a bounded

Borel measurable function on S × A, interpreted as the utility, income,

or immediate return – when the system is in state s, and the chosen action

is a, the decision maker receives an income u(s, a); δ is a discount factor,

0 <δ<1. A policy (or plan) ζ = (ζ

) speciﬁes for each t  1 which

action to choose in the tth period as a Borel measurable function of the

history h = (s

, a

,...,a

t−1

; s

) of the system up to period t, or more

generally, ζ speciﬁes for each h a probability distribution ζ

(.|h)onthe

Borel subsets of A.

A Borel function f from S into A deﬁnes a policy. When in state s,

choose an action f (s)(independently of when and how the system has

arrived at state s). We denote the corresponding policy by ( f

(∞)

). Such

policies are called stationary, and f is somewhat informally referred to

as an optimal policy function.

A policy ζ associates with each initial state s a corresponding

tth period expected return u

(ζ)(s) and an expected discounted total

return

I (ζ)(s) =

∞



t=1

t−1

(ζ)(s), (2.1)

where δ is the discount factor, 0 <δ<1.

A policy ζ

∗

will be called optimal if I (ζ

∗

)(s)  I (ζ)(s) for all policies

ζ and s ∈ S. The problem, then, is to ﬁnd an optimal policy.

6.2 The Model 381

6.2.1 Optimality and the Functional Equation

of Dynamic Programming

We begin with the case where the state space S is countable and the

action space A is ﬁnite and introduce the celebrated functional equa-

tion that characterizes optimality. Recall from Chapter 2 that the states

in a countable S are labeled i, j, k etc. Also, we write q

(a) to denote

the probability with which the state moves from i to j if (after observ-

ing i) the action a is chosen. Thus, for any ﬁxed i ∈ S, and a ∈ A,

(a) is the probability distribution of the state in the next period (i.e.,

(a) ≥ 0,



(a) = 1). The immediate return function is bounded,

i.e., there is some constant B > 0 such that |u(i, a)|≤B for all i ∈ S,

and a ∈ A.

Deﬁne

V (i) = sup

I (ζ)(i).

V : S →

R is the value function of the dynamic programming

problem.

Clearly, a policy ζ

∗

is optimal if

I (ζ

∗

)(i) = V (i) for all i ∈ S.

Our ﬁrst result characterizes the value function V by the functional

equation (2.2), which will be casually referred to as the “optimality equa-

tion” or the “functional equation” of dynamic programming.

Theorem 2.1 For all i ∈ S,

V (i) = max

a∈A



u(i, a) + δ



(a)V ( j)



. (2.2)

Proof. Let ζ be any policy, and suppose it chooses some action a ∈ A

in the initial period (period 1). Then,

I (ζ)(i) = u(i, a) + δ



(a)w

( j),

382 Discounted Dynamic Programming Under Uncertainty

where w

( j) represents the total expected return from period 2 onward,

given that ζ is used and the state in period 2 is j. But w

( j) ≤ V ( j).

Hence,

I (ζ)(i) ≤ u(i, a) +δ



(a)V ( j)

≤ max

a∈A



u(i, a) + δ



(a)V ( j)



Since ζ is arbitrary, we have

V (i) ≡ sup

I (ζ)(i) ≤ max

a∈A



u(i, a) + δ



(a)V ( j)



Conversely, using the ﬁniteness of A, let a

be such that

u(i, a

) + δ



)V ( j)

≡ max

a∈A



u(i, a) + δ



(a)V ( j)



Let ζ be the policy that chooses a

when the initial state is i, and if

the next state is j, then views the process as originating in j, and uses a

policy ζ

such that I (ζ

)( j) ≥ V ( j) − ε. Hence,

I (ζ)(i) = u(i, a

) + δ



)I (ζ

)( j)

≥ u(i, a

) + δ



)V ( j) − δε.

Since V (i) ≥ I (ζ)(i), we have

V (i) ≥ u(i, a

) + δ



)V ( j) − δε.

Hence,

V (i) ≥ max

a∈A



u(i, a) + δ



(a)V ( j)



− δε.

Since ε is arbitrary,

V (i) ≥ max

a∈A



u(i, a) + δ



(a)V ( j)

