Correlations - and Distances - Based Approaches to Static Analysis… 23
For thin or few dispersed clouds of points (Figure 19a, f), relationships between variables
can be quantified by means of Pearson correlation coefficient. In the case of more dispersed
data (Figure 19b, c, h), Spearman correlation coefficient can be used as robust statistic to
detect trends between variables (metabolites). Positive (Figure 19a-c, f) and negative (Figure
19d, g) relationships will be indicated by positive and negative correlation coefficients,
respectively.
Pearson correlation is sensitive to the non linearity of data (Figure 19d, g, h). In the case
of curvilinear relationships, the use of Pearson coefficient can find application after data
linearization using an appropriate transformation. Appropriate transformations provide
symmetrical distributions (close to normal) of the data by reducing their dispersion,
asymmetry and bias effects of isolated (extreme) points (Zar, 1999). Such transformations can
be applied either on only one or on both variables of the pair (X, Y).
Moreover, such transformations are applied to stabilize the variances between several
groups of the dataset, i.e. in the case of heteroscedastic data (non comparable variances
between groups). Therefore, the resulting homoscedasticity will make possible the application
of linear model.
IV.1.2. Data Transformation to Application of Linear Model
From a graphical visualisation, a curvilinear cloud of points (Y vs X) can be transformed
into linear form by using an appropriate formula (Zar, 1999, Legendre and Legendre, 2000).
Such a formula depends on the shape, intensity of curvature and number of inflexion point(s)
of the cloud of points Y vs X (Figure 20).
Logarithmic transformations are appropriate to linearize curvature showing slow (i) or
accelerated (ii) variations of Y vs X after an inflection (Figure 21). In the first case (i) (Figure
21a), linearization is obtained from Y vs Ln(X); in the second case (ii) (Figure 21b),
linearization is obtained from Ln(Y) vs X. More precisely, the fonction Y = a e
bX
is linearized
by taking the log of Y to give a straight-line equation with intercept Ln(a) and slope b, i.e.
ln(Y) = ln(a) + bX. In the case where Y and X are linked by a power function Y=a(X)
c
, such
non-linear relationship can be linearized by taking the logarithms of both X and Y, giving
linear equation ln(Y) = Ln(a) + c ln(X) (Figure 21c). In general, from a curvilinear cloud of
points, the appropriate model can be identified from the transformation by which the curve
becomes aligned (Figure 21).
Taking into account the distribution of each variable, logarithmic transformation can be
expected for a right asymmetric distribution, i.e. having a mode located at the left (a majority
of low values). Therefore, logarithmic transformation results in more symmetrical
distribution, i.e. a distribution which closer to normality conditions leading a possible
application of the linear model (Figure 22).
Square root transformation can be applied to linearize parabolic cloud of points.
Moreover, the square root can be preferred to the logarithm transformation (more generally
used) in the case of small dataset (few number of observations). Graphically, models
requiring square root transformation have more soft curvature than those requiring
logarithmic transformation (Figure 20a).
Clouds of points can be also linearized by means of polynomial transformations. This is
generally applied in the case where different inflection points are observed. Therefore, clouds
with k inflexion points can be fitted by means of polynomes with degree k+1 (Figure 20d).