4.3
Le
arningC
on
trol
with
Limited
Tr
aining
Data
109
M of thepolynomial. We see that the trainingset error decreases steadily as
theorder of thepolynomial increases.The test seterror, however, reaches a
minimum at M =3,and thereafterincreasesasthe order of thepolynomial
is
increased.
4.3.2Resampling Approach
Our purposeistoimprovethe functionestimationperformance of ANN learn-
ing controllers by using thepolynomial fittingapproachinterpolation samples.
Let
x
T
i
= { x
i
( t
1
) ,x
i
( t
2
) ,..., x
i
( t
n
) }
be an originaltraining sampleset of onesystem state, by the same sampling
rate ∆>0a
nd
T
j
=(X
j
,Y
j
),
where
X
j
= { x
1
( t
j
) ,x
2
( t
j
) ,.
..,
x
m
( t
j
) } is
an
original trainingsample point. An unlabelled sample set
¯
T = {
¯
T
1
,
¯
T
2
,...
¯
T
N
}
whic
hh
as
thes
ize
of
´
n can
be
ge
neratedb
yt
he
fo
llow
ing.
Step 1 Using the original trainingsample set x
T
1
to produce thesegment
of local polynomial estimation ˆx
1
( t )of x
1
( t )with respect to time t ∈ ( t
1
,t
n
).
Let
x
1
( t )=ˆx
1
( t )+´,
whereˆx
1
,x
1
,t∈ R
Step2Repeatstep 1toproduce thelocal polynomialestimationˆx
i
( t )
( i =2, 3 ,..., m )and
ˆ
Y ( t )of x
i
( t )(i =2, 3 ,..., m )and Y ( t )with respect to
time t ∈ ( t
1
,t
n
).
Step3Divide the sampling rate ∆>0by k =´n/n +1to produce new
sampling time
in
terv
al
∆
k
= ∆/k.Weproduce new unlabelled samples
´
T by
ac
ombination of interpolatingt
he po
lynomials ˆ
x
i
( t )(i =1, 2 ,..., m )and
ˆ
Y ( t )
in the newsampling rate.
Figure 4.17shows an example of oneunlabelled trainingsample generation
process.Inour method, unlabelled trainingsamplescan be generatedinany
number
.
Barron [8]has studiedthe way in whichthe residualsum-of-squares error
decreasesasthe number of parameters in amodel is increased. Forneural
networks he showedthatthis errorfallsas O (1/M
), where M
is the number
of hiddennodes in aone hiddenlayer network. By contrast,the erroronly
decreases as O (1/M
2 /d
), where d is the dimensionalityofthe input space for
polynomials, or indeed anyother series expansion in whichitisthe coeffi-
cientsoflinear combinations of fixed functions that areadapted. However,
fromthe formeranalysis, the number of hiddennodes
M
chosencannotbe
arbitrary large. Forapractitioner, his theory provides the guidance to choose
the number of variables d ,the number of network nodes M
,and the sample
size n ,suchthatboth 1
/M
and(M
d/n)log n aresmall. In particular, with
M
∼ ( n/( d log n ))
1 / 2
,the bound on themean squared error is aconstant
multiple of ( d log n/n)
1 / 2
.