Correlations - and Distances - Based Approaches to Static Analysis… 47
- Standardized PCA is applied from the square matrix Z’Z, with
SD
XX
Z
−
=
, and
where
and SD are the mean and standard deviation of each corresponding
variable, respectively.
- Rank-based PCA is applied on the square matrix K’K, where K is the rank matrix
representing the ranked data for each variable of dataset X.
The applications of these different kinds of PCA require some conditions and have
different interests:
Centred PCA application is applied when all the variables have the same unit (e.g.
µg/mL). Its interest consists in highlighting the effect of the most dispersed variables on the
structure of the dataset. Thus, the most dispersed variables can be considered as more rich in
information than the less dispersed ones. Centred PCA helps to identify how the individuals
(profiles) are separated the ones from the others under the dispersion effect of some variables.
Moreover, such a multivariate analysis allows classification of the different variables
according to their variation scales and directions (i.e. according to their covariances). In
centred PCA, the sum of the eigenvalues is equal to the total variance of the dataset.
Standardized PCA is required when the dataset consists of heterogeneous variables
expressed with different measure units (µg, mL, °c, etc.). Also, it is required when the
variables have different variation scales due to incomparable variances. In these cases, the
values of each variable X
j
are standardized by subtracting the mean
j
X and by dividing by
the standard deviation SD
j
. Graphically, the set of standardizations attributes to the variables
different relative positions which are interpretable in terms of Pearson correlations: the co-
response of two variables will be highlighted by two vectors which will be projected along a
same direction in the multivariate space. If two variables are positively correlated, their
corresponding vectors will have a very sharp angle (0≤ ≤π/4); in the case of negatively
correlated variables, the corresponding vectors will be opposite, i.e. their angle will be
strongly obtuse (3π/4≤ ≤π). In the case of low correlations, the two vectors corresponding to
the paired variables will have almost perpendicular directions. In standardized PCA, the sum
of the eigenvalues is equal to the number (p) of variables.
Rank-based PCA finds an exclusive application on ordinal qualitative dataset where the
variables are not measured but consist of different classification modalities of the individuals
(e.g. modalities low, intermediate, high levels). After substitution of the ordinal data by their
ranks, a standardized PCA can be applied to analyse correlations between the qualitative
variables on the basis of Spearman statistics. Rank-based PCA finds also application on
heterogeneous datasets because of different variable units or because of imbalanced variation
ranges of the variables.
IV.1.6.6. Numerical Application and Interpretation of Standardized PCA
The application of standardized PCA will be illustrated by a numerical example based on
a dataset of n=9 rows and p=5 columns (Figure 41). Under a metabolomic aspect, let’s
consider the rows as metabolic profiles, the columns as metabolites and the data as
concentrations.