There exist many ways to determine the number of principal components to be
retained (R). Many of these are closely related to the afore-mentioned goodness-
related indexes. The most simple may result the cumulative percent variance
(CPV) criteria in which one should retain as many PC’s as needed to match a
previously defined percentage of reconstruction for r
T
. Further information
regarding this and other criteria can be found elsewhere [13, 15].
4.3 PCA-Based Monitoring Scheme
The set up of a statistical process monitoring scheme is usually presented as a
two-stage procedure. Stage I, also referred as ‘off line stage’, includes the main
calculations to build the PCA model and to estimate the threshold values for the
hypothesis testing. Although it is not frequently mentioned, a central step is the
correct pretreatment of the data set that will be used to calculate the model.
Pretreatment activities may include—but are not limited to—measurement
scaling, outlier detection and data clustering and classification. Data pretreatment
is not usually covered in research papers dealing with process monitoring;
nevertheless, the success of the whole monitoring strategy strongly depends on it.
Pretreatment-related activities by itself do constitute a whole subject, and are out
of the scope of this chapter. Descriptions of the most commonly used tools for data
pretreatment, its purposes and limitations can be found elsewhere [13]. The main
objective of data pretreatment is to obtain a data set that can be considered as a
good sample of the process operation in normal operation condition (NOC)
because they will be used to build the PCA model and to determine the normal
region.
Let us consider X to be the process data matrix after an appropriate pretreat-
ment. As it is usually the case, process observations are made of various measured
variables with different measurement units and variability ranges. As a conse-
quence, a very common step during stage I is variable scaling or standardization.
This procedure results useful for both putting measurements a common unit-less
scale and avoiding undesired side effects that can arise from working with vari-
ables in very different scales and ranges. Once X has been properly scaled, the
PCA model is calculated as it is explained in previous chapter ‘‘Modelling Syngas
Generation’’ i n Sect. 2.1, and the normal operation regions have to be determined.
The former PCA model serves as basis for comparison of new metrics, con-
sequently, the adequacy of these approaches heavily depends on how well this
PCA model represents the plant behaviour.
It is a very common practice in PCA-based monitoring approaches to use two
complementary statistics to follow the process evolution. Typical statistic metrics
are the Hotelling’s T
2
[16] and the squared prediction error (SPE).
The T
2
statistic metric is widely employed in multivariate systems analysis, and
it was proposed as a generalisation of Student’s t distribution to the multivariate
case. It is based on the Mahalanobis distance, which confers it the capacity for
316 A. D. Bojarski et al.