Yangsheng Xu, Yongsheng Ou. Control of Single Wheel Robots

Подождите немного. Документ загружается.

4.3

arningC

trol

with

Limited

aining

Data

103

In this problem, we mainly face three typesoferror. The ﬁrst type of

error is measurementerror, which is derivedfromthe sensorreading process.

We call it observationerror. It mayexist in both X and Y .The second type

erroro

ccursd

the

diﬀ

erence

twe

the

tru

map

ping

functiona

some

ﬁxedn

eural

net

ructure.

call

str

ucture

error.

pro

Cyb

enk

28]

1989,

mostf

unctions

(in

cluding

tinu

ousf

unctionw

ith

bounded support) can be approximatedbyfunctions of aone-hidden-layer BP

neuralnetwork. Thus, we omit this type of errorinour following discussion

andassume that we alreadyhaveagood enough structurethatallows us to

approximate anyfunction, if we choose suitable parameters forthatstructure,

i.e. thereexists θ

∗

,suchthat

Y = F ( X )=f ( X, θ

∗

) , (4.74)

where θ

∗

=(θ

∗

,θ

∗

,..., θ

∗

)

, h is the number of thestructureparameters. The

last type of erroristhe parametererrorand it is the result of theestimating of

the best suitable parameters. The three typesoferrorare closely related to the

estimation error. To

compensate for the

observatione

rror, more training data

lead to better results. At thesame time,the parametererroralso requires more

trainingdata to ov

ercome the

overﬁtting problem, whichw

ill be

discussedi

more detail in next section. Thus, the

estimationerrorc

an be

reduced, but

notremovedabsolutely, because it is impossible to collect inﬁnite training

samples.

Next,wewill address ourproblembyconsidering these errors, individually.

Let Y

be the observations fortrue Y .Let

= Y + ,

where  hast

he Gaussiandistribution with

E (  )=0an

> 0. If

be the observation of X ,

= X + ε,

where ε ∈ R

.Here, we assume each system state x

hasthe similarobser-

vation error as Y .The distribution of ε hasthe form N (0,σ

). Here, we

have adata table,whichincludes theobservation forbothinputs andoutputs

to train aneural network as alearningcontroller.The data table comes from

discrete time sampling fromhumanexpert controldemonstrations. In fact,

from the data table, we have Y

and X

,but not Y and X ,thuswemodify

Equation (4.74) to

= f ( X, θ

∗

)+. (4.75)

Usually,observation errors are much smaller then their truevalue. ATaylor

expansion to the ﬁrst order can be usedtoapproximate f ( X

,θ)interms of

f ( X, θ )

f ( X

,θ) ≈ f ( X, θ )+(L

f )

· ε, (4.76)

where L

f =(

∂f( X,θ )

∂x

∂f( X,θ )

∂x

,...,

∂f( X,θ )

∂x

)

104 4 Learning-based Control

We focus our attention on the estimation of regression function f , here,

for θ

∗

. We show that for a fairly broad class of sampling scheme t

with a

ﬁxed and identical sampling rate ∆ > 0. Assume that (

f )isanestimationof

Y fromar

andoms

ample

T of

size

n .T

system

represent

Equation

(4.75). Theleast-squares estimate of θ

∗

θ ,w

hic

btainedb

inimizing

the error function given in (4.77)(forneural networks,the errorexappropri-

ation algorithm is acommon method forminimizingthe errorfunction). The

predicted outputfromthe modelis

,asshown in (4.78)

S ( θ )=



i =1

[ Y

− f ( X

,θ)]

(4.77)

= f ( X

θ )

= f ( X

θ )+(L

f )

· ε.

(4.78)

If themodel provides an accurate prediction of the actual system behavior,

then

θ is close to the truevalue of theset of parameters θ

∗

andaTaylor

expansion to

theﬁrst-order canb

sed to

approximate

f ( X

θ )inter

f ( X

,θ

∗

)

f ( X

θ ) ≈ f ( X

,θ

∗

)+(L

f )

· (

θ − θ

∗

) ,

where L

f =(

∂f( X,θ

∗

)

∂θ

∗

∂f( X,θ

∗

)

∂θ

∗

,...,

∂f( X,θ

∗

)

∂θ

∗

)

hen, Equation (4.78) turns

to be

≈ f ( X

,θ

∗

)+(L

f )

· (

θ − θ

∗

)+(L

f )

· ε. (4.79)

Thesubscript value of o is given to denote the set of observation points

other than that used forthe least-squares estimation of

∗

.Byignoringthe

second-order small

quantityand using Equations(4.75) and(

4.79),Equation

(4.80) gives the diﬀerence between the newobservation Y

andthe predicted

valu

,and Equation (4.81) gives theexpected value of thediﬀerence.

−

≈ 

− ( L

)

(

θ − θ

∗

) − ( L

)

· ε

. (4.80)

E [ Y

−

] ≈ E [ 

] − ( L

)

E [(

θ − θ

∗

)] − ( L

)

E [ ε

] ≈ 0 . (4.81)

Because of

thestatisticali

ndependencea

mong



θ and ε

,the variance can

be expressed as

var [ Y

−

] ≈ var [ 

]+var [(L

)

(

θ − θ

∗

)] + var [(L

)

] . (4.82)

Thedistribution of (

θ − θ

∗

)can be approximatedashaving the distribu-

tion N (0,σ

[ F

(

θ )

(

θ )]

− 1

)in[24]. TheJacobian matrix F

(

θ )has theform

shown in Equation(4.83),where the single periodisplaced to accord with the

notationsused in [93] whichdenotes that the matrix has ﬁrst-order diﬀerential

terms

4.3

arningC

trol

with

Limited

aining

Data

105

( θ

∗







(

∂f( X

,θ

∗

)

∂θ

∗

)(

∂f( X

,θ

∗

)

∂θ

∗

) ··· (

∂f( X

,θ

∗

)

∂θ

∗

)

(

∂f( X

,θ

∗

)

∂θ

∗

) ··· ··· (

∂f( X

,θ

∗

)

∂θ

∗

)

(

∂f( X

,θ

∗

)

∂θ

∗

)(

∂f( X

,θ

∗

)

∂θ

∗

) ··· (

∂f( X

,θ

∗

)

∂θ

∗

)







(4.83)

[ Y

−

] ≈ σ

+ σ

( L

)

( F

)

− 1

( L

)+σ

( L

)

( L

)]. (4.84)

Thematrix F hasthe dimensions n by h ,where n is the number of samples

and h is the number of parameters θ ,whichcomposes

θ .Toﬁnd the eﬀects of

moresampling data on var [ Y

−

], we needthe following useful result.

Lemma 1 Let A and D be nonsingularsquared matrices of orders k .

Then,provided that theinverses exist,

( A + D )

− 1

= A

− 1

− A

− 1

( D

− 1

+ A

− 1

)

− 1

. (4.85)

Proof: Averiﬁcationispossible by showingthatthe product of ( A + D )

− 1

andt

he right-hand sideo

fEquation (4.85)isthe identit

atrix.

If we have more trainingsamples´n = n + n

,let

( θ

∗







(

∂f( X

,θ

∗

)

∂θ

∗

)(

∂f( X

,θ

∗

)

∂θ

∗

) ··· (

∂f( X

,θ

∗

)

∂θ

∗

)

(

∂f( X

,θ

∗

)

∂θ

∗

) ··· ··· (

∂f( X

,θ

∗

)

∂θ

∗

)

(

∂f( X

´ n

,θ

∗

)

∂θ

∗

)(

∂f( X

´ n

,θ

∗

)

∂θ

∗

) ··· (

∂f( X

´ n

,θ

∗

)

∂θ

∗

) .







(4.86)

Thematrix

F hasthe dimensions ´n by h .Then,

F = F

F + F

, (4.87)

where F

is similarly deﬁneda

nEquation (4.83), but

n is substituted with

.Let X be a h × 1v

ector. Since

F and F

arep

ositive

deﬁnite matrices,

substituting A = F

F ,and D = F

,inEquation(4.85),wehave A

− 1

and

− 1

( D

− 1

+ A

− 1

)

− 1

arepositive deﬁnite, provided that they exist. From

Lemma 1 ,

( A + D )

− 1

X ≤ X

− 1

X. (4.88)

Then,fromEquation (4.87)

(

F )

− 1

X ≤ X

( F

F )

− 1

X. (4.89)

Hence, when h is ﬁxed,the larger is n ,the smaller is var [ Y

−

]. However,

forour problem, that n is not large enough,thus, the majorerrors come

fromthe second term of Equation (4.84). In this chapter, we propose to use

unlabelled trainingsamplestoreduce the variance of Y ,i.e., to decrease the

learningerror. The methods will be addressedinfollowing sections in greater

detail.Ifwedeﬁnethe data in thedata table (whichhas been collected from

106 4 Learning-based Control

human demonstrations) as labelled data, we then call the new data unlabelled

data, which are produced by interpolation from the labelled data. Let (

Y )

be oneunlabelled data point.

assumedt

hatt

terp

olation

pro

cess

iased

estima

tion,

then

let

X = X +´ε,

Y = Y +´, (4.90)

wherethe distributionsof´ε and´ have the form N (0, ´σ

N (0, ´σ

Sincethe unlabelled data areproducedfromoriginalobservation data ( X

they

will

larger

errors

than

the

iginald

ata,

i.e.,´

>σ

.Both thesen-

sor readingerrorand the independentinterpolation error arecorrelative with

each other.Assume that totally ´n ( >n)unlabelled data points areproduced.

If the unlabelled data areused to train the neural network, we needtoupdate

Equations(4.75) -(4.84) with Equation (4.90). Finally, we have

var [ Y

−

] ≈ ´σ

+ σ

( L

)

(

)

− 1

( L

)+´σ

( L

)

( L

)], (4.91)

where

is deﬁned in

Equation(

4.86) and´

n is the numb

unlabelled sam-

ples used to obtain θ .Since ´n>n,wereduce the second term of var [ Y

−

However, the ﬁrst termand third termare notrelativetothe samplenumber

n ,but the accuracy of the trainingsample.Infact,byusing the unlabelled

trainingdata we

increased them.

The ﬁnal result of

var [ Y

−

ependent

on thecompetition of thethree terms.Inmost cases,especially,when theorig-

inal trainingdata is ve

ry sparse,t

he second term in Equation (4.91) dominates

the learning error. Thus, by using unlabelled data, we can reduce the learn-

ing error. However, whenthe number of unlabelled data continue to increase,

thee

ﬀect of the other two

terms will be

larger than the second

term, ﬁnally.

Moreover, theinterpolation error mayalso be larger, i.e., ´σ will increase. This

is because of the overﬁtting phenomenon. Thus, using toomuchunlabelled

data will not necessarily lead to better learning results. Later simulationswill

also demonstrate this point.

The Overﬁtting Phenomenon

Here, we choose neuralnetworks to realize thefunctionregression. Many of

the importantissues concerning the application of neuralnetworks can be

introduced in the simpler context of polynomialcurveﬁtting. The problem is

to ﬁt apolynomial to aset of N data points by atechnique of minimizing

errorfunction. Consider the M th-order polynomialgiven by

f ( x )=w

+ w

x + ... + w



j =0

. (4.92)

This can be regarded as anon-linear mapping which takes

x as input andpro-

duces f ( x )asoutput.The preciseformofthe function

f ( x )isdetermined by

4.3

arningC

trol

with

Limited

aining

Data

107

thevalues of theparameters w

,..., w

,whichare analogous to theweights in

aneural network. It is convenienttodenote the set of parameters ( w

,..., w

)

by the vector W .The polynomial can then be writtenasafunctionalmapping

the

f = f ( x,

We shall labelthe data withthe index i =1,..., n ,sothateachdata point

consists

alueo

x ,d

enoted

,and acorresponding desiredvaluefor

the

tput

f ,w

hic

hall

deno

Arti Scholkopf [15] pointed out that the actual risk R ( W )ofthe learning

machine is expressed as:

R ( W )=



 f (x,W) − y  dP (x,y) . (4.93)

Theproblemisthat R ( W )isunknown, since P (x,y)isunknown.

The straightforwardapproachtominimize the empirical risk,

emp

( W )=



i =1

 f (x,W)

− y

 ,

turns outnot to guaranteeasmall actual risk R ( w ), if the number n of

trainingexamples is limited. In

other wo

rds: as

mall error on

thetraining set

does notnecessarily imply ahigh generalization ability(i.e., small error on

an independenttest set). This phenomenon is often referred to as overfitting.

he learning problem, the

cture

Risk Minimization

(SRM)p

rinciple

is based on thefact that forany W ∈ Λ and n>h,with aprobabilityofat

least 1-η ,t

he bo

und

R ( W ) ≤ R

emp

( W )+Φ (

log ( η )

)(

4.94)

holds,where the confidence termΦis deﬁned as

Φ (

log ( η )



h ( log

2 n

+1) − log ( η/4)

The parameter h is called the VC ( Vapnik- Chervonenkis)dimension of a

set of functions, whichdescribes the capacityofaset of functions. Usually,to

decrease the R

emp

( w )tosome bound, most ANNs with complex mathematical

structures have avery high value of h .Itisnotedthatwhen n/h is small (for

example less than 20,the trainingsample is small in size),

Φ hasalarge value.

When this occurs, performance poorly represents

R ( W )with R

emp

( W ). As a

result,accordingtothe SRM principle, alarge trainingsample size is required

to acquire asatisfactorylearningmachine.

We can illustrate this by aone-dimensionpolynomialcurveﬁtting problem.

Assume we generate trainingdata fromthe function

h ( x )=0. 5+0. 4sin(2πx)+, (4.95)

108 4 Learning-based Control

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

h(x)

Fig. 4.13. Linear regression M=1.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

h(x)

Fig. 4.14. Polynomial degree M=3.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

h(x)

Fig. 4.15. Polynomial degreeM=10.

0 1 2 3 4 5 6 7 8 9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

order of polynomial

RMS error

training

test

Fig. 4.16. The RMS error for both train-

ing andtest sets.

by sampling the function h ( x )atanequal interval of x andthen adding ran-

domn

oisewith Gaussian distribution ha

ving

astandard deviation

σ =0. 05.

We will therefore be interested in seeinghow close thepolynomial f ( x, W )is

to the function h ( x ). Figure 4.13shows the11pointsfromthe training set,

as we

ll as the function

h ( x ), together with the result of

ﬁtting alinear poly-

nomial, givenby(4.92) with M =1.Ascan be seen, this polynomial gives

apoorrepresentation of h ( x ). We can obtain abetterﬁtbyincreasing the

order of thepolynomial, since this increases the number of the freeparameter

number in the function, whichgives it greater ﬂexibility. Figure 4.14shows

theresult of ﬁtting acubic polynomial ( M =3), whichgives amuchbetter

approximation to h ( x ). If, however, we increase the order of thepolynomial

toofar,then theproximation to theunderlying functionactually gets worse.

Figure 4.15shows theresult of ﬁttinga10th-order polynomial ( M =10).

This is nowable to achieveaperfect ﬁt to the trainingdata. However, the

polynomial has ﬁtted the data by developingsome dramaticoscillations. Such

functions aresaid to be over-ﬁtting to thedata. As aconsequence,this func-

tion gives apoorrepresentationof h ( x ). Figure 4.16shows aplot of R

emp

for both the trainingdata setand the test data set, as afunctionoforder

4.3

arningC

trol

with

Limited

aining

Data

109

M of thepolynomial. We see that the trainingset error decreases steadily as

theorder of thepolynomial increases.The test seterror, however, reaches a

minimum at M =3,and thereafterincreasesasthe order of thepolynomial

increased.

4.3.2Resampling Approach

Our purposeistoimprovethe functionestimationperformance of ANN learn-

ing controllers by using thepolynomial ﬁttingapproachinterpolation samples.

Let

= { x

( t

) ,x

( t

) ,..., x

( t

) }

be an originaltraining sampleset of onesystem state, by the same sampling

rate ∆>0a

=(X

where

= { x

( t

) ,x

( t

) ,.

..,

( t

) } is

original trainingsample point. An unlabelled sample set

T = {

,...

}

whic

thes

ize

n can

neratedb

llow

ing.

Step 1 Using the original trainingsample set x

to produce thesegment

of local polynomial estimation ˆx

( t )of x

( t )with respect to time t ∈ ( t

Let

( t )=ˆx

( t )+´,

whereˆx

,t∈ R

Step2Repeatstep 1toproduce thelocal polynomialestimationˆx

( t )

( i =2, 3 ,..., m )and

Y ( t )of x

( t )(i =2, 3 ,..., m )and Y ( t )with respect to

time t ∈ ( t

Step3Divide the sampling rate ∆>0by k =´n/n +1to produce new

sampling time

terv

∆

= ∆/k.Weproduce new unlabelled samples

T by

ombination of interpolatingt

he po

lynomials ˆ

( t )(i =1, 2 ,..., m )and

Y ( t )

in the newsampling rate.

Figure 4.17shows an example of oneunlabelled trainingsample generation

process.Inour method, unlabelled trainingsamplescan be generatedinany

number

Barron [8]has studiedthe way in whichthe residualsum-of-squares error

decreasesasthe number of parameters in amodel is increased. Forneural

networks he showedthatthis errorfallsas O (1/M



), where M



is the number

of hiddennodes in aone hiddenlayer network. By contrast,the erroronly

decreases as O (1/M

2 /d

), where d is the dimensionalityofthe input space for

polynomials, or indeed anyother series expansion in whichitisthe coeﬃ-

cientsoflinear combinations of ﬁxed functions that areadapted. However,

fromthe formeranalysis, the number of hiddennodes



chosencannotbe

arbitrary large. Forapractitioner, his theory provides the guidance to choose

the number of variables d ,the number of network nodes M



,and the sample

size n ,suchthatboth 1



and(M



d/n)log n aresmall. In particular, with



∼ ( n/( d log n ))

1 / 2

,the bound on themean squared error is aconstant

multiple of ( d log n/n)

1 / 2

110 4 Learning-based Control

tt tttt

4567

Y(t)

x(t)

Original training sample

Unlabelled training sample

Fig. 4.17. Examples of the unlabelled sample generation, when k =3.

Unlabelled trainingsample data canbeproducedinany large number by

interpolation. Prov

ided this condition, if

choose some

suitable andlarge

number of nodes ( M



), we can obtain averyhigh learningprecision. At this

time, thelearningerrori

ainly derivedfromthe polynomialc

urveﬁtting,

instead of neuralnetwork learning.However, we have improvedthe learning

these original limited trainingsamples, fort

he learning problemh

as turned

the inputs from x ∈ R

to t ∈ R

.Byusing this approach,wehaveturned

ultivariate function estimationtoau

nivariate function estimation(i.e.

one-by-one mapping).

From Equation (4.91), the estimationaccuracy of this interpolation process

is important, foritwill aﬀect the ﬁrstand third terms in Equation (4.91)

andﬁnally, thelearningcontroller.Inthe nextsection, the localpolynomial

ﬁtting algorithm foro

ne dimensionalp

olynomial curveﬁtting is introduced

as afurtherimprovement.

4.3.3Local Polynomial Fitting(LPF)

Advan

tages

of Local Po

lynomial Fitting

Compared with the traditional methodofpolynomialcurveﬁtting, thereare

threemain advantages in using the LPF method.

First, polynomial curve ﬁtting is aparametric regression method,basedon

an important assumption that the estimation modeliscorrect. Otherwise, a

large bias will inhibitprecision. LPF as anonparametricregression approach

removes this assumption.Thus, the approachcan be feasible forany model

in unknown form and still retain auniform convergence.

4.3

arningC

trol

with

Limited

aining

Data

111

Second, LPF is awell-deﬁned approachfor functionestimation, whichis

basedonrigorous theoretical research. To obtain an ideal estimationperfor-

mance,moreworkonpolynomial curve ﬁtting is required,suchasthe order

lynomials

osen.

Third,

LPF

requiresn

ndary

diﬁcations.

Boundary

diﬁcations

in polynomial curve ﬁtting are averydiﬃcult task.

tro

duction

cal

lynomial

Fitting

Let X and Y be two random variableswhose relationship can be modelled as

Y = m ( X )+σ ( X ) E =0,var(  )=1, (4.96)

where X and  areindependent. Of interest is to estimate theregression func-

tion m ( x )=E ( Y | X = x ), based on ( X

) ,.

..,

( X

andoms

ample

from(X, Y ). If x is not arandomvariable but y is, their relationshipcan be

modelled as

Y = m ( X )+E(  )=0,var(  )=1.

If the ( p +1)th derivativeof m ( x )atthe point x

exists, we

can

approxi-

mate m ( x )locallybyapolynomialoforder p :

m ( x ) ≈ m ( x

)+m



( x

)(x − x

)+... + m

( p )

( x

)(x − x

)

/p! , (4.97)

for x in an

eighbo

rhood

,byusing Taylor’s expansion. From astatistical

modellingv

iewpoint,

Equation(

4.79) models

m ( x )l

allyb

simple

poly-

nomial model. This suggests using alocallyweighted polynomialregression

( x )=



i =1

{ Y

−



j =0

( X

− x

)

}

K (

− x

) , (4.98)

where K ( · )d

enotes an

on-negativew

eight(

rnel)f

unctionand

h -ban

-determines the size of the neighborhood of x

.If

β =(

,...,

enotes the

solution to the aboveweighted least squares problem, thenbyEquation (4.97),

β estimates m ( x ).

It is more convenient

to write the

abov

east squares problem in

matrix

notation. Denote by W the diagonal matrix with entries W

= K ((X

− x

) /h).

Let X be the design matrix whose ( l, j )th elementis(X

− x

)

j − 1

andput

y =(Y

,...Y

)

.Then, the weighted least squares problem (4.98)can be

written in matr

ix form as

min

( y − Xβ)

W ( y − Xβ) ,

where β =(β

,...β

)

.Ordinary least squares theory provides the solution

β =(X

WX)

− 1

Wy,

112 4 Learning-based Control

whose conditional mean and variance are

E (

β | X

,...X

)=(X

WX)

− 1

= β +(X

WX)

− 1

Wr,

var (

β | X

,...X

)=(X

WX)

− 1

( X

ΣX)(X

WX)

− 1

(4.99)

where m = m ( X

) ,...m( X

)

, r = m − Xβ,the residualofthe localpolyno-

mialapproximation,and Σ = diag [ K

{ ( X

− x

) /h} σ

( X

)].

At ﬁrst glance,the aboveregression approach lookssimilar to the tradi-

tional pa

rametric

approach

funct

ioni

sually

globa

delled

by apolynomial. In order to have asatisfactorymodelling bias, thedegree M

of thepolynomial oftenhas to be large. But this large degree M introduces

an over-parametrization, resultinginalarge variabilityofthe estimatedpa-

rameters. As aconsequence theestimated regression functionisnumerically

unstable.I

arke

ast

this

parametrica

pproac

the

tec

hnique

local ,and hencerequires asmall degree of thelocal polynomial, typicallyof

order p =1

or occasionally

p =3.

4.3.4 Simulationsand Experiments

To illustrate the behavior of using the unlabelled trainingdata method for

solving the

small sample

size

problem, whic

hisproposed in the foregoing

section, we presenthere anumerical study,whichshows theresult that the

theory in this ch

apterw

orks.All of

thesimulation tasks arec

ompletedin

MATLAB.

Forthe simulationstudy, we want to buildatwo-input and one-output

model. Their relations are as

follows.

( t )=0. 5+0. 4sin(2πt)+

( t )=− 0 . 6+0. 3cos

(2πt

)+

Y ( t )=cos(x

)+sin( x

)+x

+ 

(4.100)

where, the distributions of 

, 

and 

have thesame form N (0,σ

)and

σ =0. 01. Thesampling ratesare identical as ∆ =0. 1 s both for trainingdata

andtesting data. Fortraining data,the time interval is t

=0and t

=1s ,

suchthattotal n

=11data points in table 4.3 andfor testingdata, thetime

interval is t

=0. 05 and t

=0. 95s ,s

ucht

hatt

otal

ata points

table 4.4. Sincethe maximfrequency Ω

of thesystem in Equation(4.100) is

1 Hz andthe samplingfrequency Ω

is 10Hz > 2 Ω

,itsatisﬁes the“Shannon”

Theorem.

We use atwo-layerradial basis network structure. Theﬁrst layerhas radial-

basisneurons. The second layerhas pure-line neurons, andwecalculate its

weighted input with dot-prod.Both layers have biases. We usedMATLAB

function “ newrbe ”with “spread=0.1”, to train theneural network. Then, we

use theEquation (4.101) as evaluating function with the testing data for