102 4 Learning-based Control
instance, we can only collect very limited training samples from a cycle of
human demonstration. It is well known that when the ratio of the number
of training samples to the VC (Vapnik-Chervonenkis) dimension of the func-
tion is small, the estimates of the regression function are not accurate and,
therefore, the learning control results may not be satisfactory. Meanwhile, the
real-time sensor data always have random noise. This has a bad effect on the
learning control performance, as well. Thus, we need large sets of data to over-
come these problems. Moreover, sometimes we need to include some history
information of systems states and/or control inputs into the ANN inputs to
build a more stable model. These will cause the increasing of the dimension or
features of the neural network and increase the requirement for more training
data.
In this work, our main aim is to produce more new training samples (called
unlabelled sample, here) without increasing costs and to enforce the learning
effect, so as to improve learning control.
The main problem in statistical pattern recognition is to design a classifier.
A considerable amount of effort has been devoted to designing a classifier in
small training sample size situations [40], [41] and [83]. Many methods and
theoretical analysis have focused on nearest neighbor re-sampling or boot-
strap re-sampling. However, the major problem in learning control is function
approximation. There is limited research exploring function regression un-
der conditions of sparse data. Janet and Alice’s work in [18] examined three
re-sampling methods (cross validation, jackknife and bootstrap) for function
estimation.
In this chapter, we use the local polynomial fitting approach to individually
rebuild the time-variant functions of system states. Then, through interpola-
tion in a smaller sampling time interval, we can rebuild any number of new
samples (or unlabelled samples).
4.3.1 Effect of Small Training Sample Size
Our learning human control problem might be thought of as building a map
between the system states X and the control inputs Y , approximately. Both
X = ( x
1
, x
2
, ...x
m
) and Y are continuous time-various vectors, where x
i
is one
of the system states and they are true values, but not random variables. In
fact, X may consist of a number of variables in previous and current system
states and previous control inputs. Y are current control inputs. Furthermore,
without the loss of generality, we restrict Y to be a scalar for the purposes of
simplifying the discussion.
Y =
ˆ
F ( X )+Err, (4.73)
where
ˆ
F is the estimation fortrue relation function
F and Err ∈ R is the total
errorofthe learning,whichneed to be reduced to lowerthansome desirable
value.