120 5 Further Topics on Learning-based Control
include all the necessary information to determine the learning output. It
means that the input selection process will not produce any new feature.
When using random sampling, or experimental designs, several factors can
be varied independently. Factors can be varied one-at-a-time (OAT) and all
the other factors can be held constant, at the same time. Modeling human
control strategy is not an OAT process. All the effective factors (system states
or parameters) are nonlinearly relevant to each other, i.e. if one of these factors
changes, then all the other key parameters will also change.
Different feature selection methods have been analyzed in the past. Based
on information theory, Battiti [9] proposed the application of mutual informa-
tion critera to evaluate a set of candidate features and to select an informative
subset to be used as input features for a neural network classifier. In [54], inter-
class and intraclass distances are used as the criterion for feature selection. A
method proposed by Thawonmas [105] performs an analysis of fuzzy regions.
All these approaches are suitable for classification problems. Here, modeling
human control strategy is a problem of regression. A large set of methods for
input selection are based on the analysis of a trained multilayer feedforward
neural net work or other specific network architectures in [17], [25], [69] and
[101]. Even though, in these methods, the subset features are optimal for their
special network architectures, it is possible they are not suitable or optimal
for some other network architectures. Our method, while producing similar
results in test cases, is applied before learning starts and therefore does not de-
pend on the learning process. In [85] different feature evaluation methods are
compared. In particular, the method based on principal component analysis
(PCA) evaluates features according to the projection of the largest eigenvector
of the correlation matrix on the initial dimensions. A major weakness of these
methods is that they are not invariant under a transformation of the variables.
Some nonlinear PCA approaches are examined in the literature [50]-[51]. In
fact, the process of building up a nonlinear PCA is similar to the training
process for a feedforward neural network. Another large class of methods are
called filter wrapper and a combination of them [19], [56] and [68]. These,
methods are focused on a searching strategy for the best subset features. The
selection criteria are similar to the above approaches. The searching strat-
egy is very import for classification problems with hundreds or thousands of
features. However, for regression problems there are usually fewer features.
Feature selection is one of the initial steps in the learning process. There-
fore, we propose to analyze through some relatively simple and efficient pro-
cesses, the search for the smallest subset of full features. This process of input
selection ensures that learned models are not over-parameterized and are able
to be generalized beyond the narrow training data. Our approach can be di-
vided into three subtasks:
1. significance analysis for the full features;
2. dependence analysis among a select set of features;
3. self-contained analysis between the select set of features and the learning
output.