Nabil Semmar 2
According to the context, such matrices can contain (a) binary codes formulating the
adjacencies between metabolites, (b) stoichiometric coefficients of metabolic reactions, (c)
transition probabilities between different metabolic states, (d) partial derivatives of the system
according to small perturbations, (e) contributions of different metabolic pathways, etc. Such
matrices are used to describe/handle the complex structures, processes and evolutions of
metabolic systems. General applications and interests of these different matrix-based
approaches are illustrated in a first general section of the chapter, followed by a second
detailed section on the correlation and distance-based analyses.
I. Introduction
Metabolomics aims at unbiased and comprehensive analysis of the biosynthesis,
regulation, distribution and control processes of the metabolites in cells, tissues or organisms
(Figure 1) (Goodacre et al., 2004; Sumner et al., 2003; Kell, 2004; Sweetlove and Fernie,
2005; Fernie et al., 2004). It is a multidisciplinary field including many approaches which
analyse the metabolites’ content of a biological system in relation to several biological factors
(genome, proteome, physiology, environment) leading to a better understanding of the
organization, behaviour and control of metabolic networks (Olivier et al., 1998; Roessner et
el., 2001; Nicholson et al., 1999; Kell, 2002; Ott et al., 2003; Weckwerth, 2003).
Metabolism represents a complex system characterized by a great variability of chemical
structures, biosynthesis levels, regulation ratios and flux distributions of metabolites (Kacser
and Burns, 1973; Savageau, 1976; Atkinson, 1977; Hayashi and Sakamoto, 1986; Fell, 1996;
Heinrich and Schuster, 1996). Such complex variability can be observed from continuums of
metabolic profiles in which the metabolites vary qualitatively and quantitatively the ones in
favour or at the expense of others. Subsequently, statistical methods are needed to detect,
quantify, classify and associate different kinds of variations at metabolite and at metabolic
pathway levels.
Statistically, the metabolic variability is analysed from a dataset or matrix consisting of n
rows (or n profiles) and p columns (p metabolites). Therefore, three kinds of variability can
be analysed, viz. along the rows, along the columns and by associating rows and columns
(Nicholson et al., 1999; Semmar et al., 2001, 2005a, 2007, 2008; Lindon et al., 2007; Denkert
et al., 2008):
Column analysis is closely linked to a correlation screening between variables. The set of
different correlations between metabolites (variables) helps to detect different trends that can
be interpreted as different metabolic pathways in the metabolic network. Row analysis aims
to quantify similarities between individual profiles on the basis of distances or similarity
indices calculus. The resulting calculated distance or similarity matrix can be used to classify
profiles into different groups that can be interpreted in terms of different polymorphim poles.
Association analysis between rows and columns provides complementary information
concerning original or atypical profiles due to relatively high (or low) values for some
metabolites. Such analysis is closely linked to outlier diagnostics which use different distance
kinds to detect atypical profiles according to different statistical criteria. The application of
different outlier diagnostic criteria allows to check if atypical profiles are confirmed by
different criteria or particularly highlighted by only one criterion
Apart from these three basic statistical analyses (column-, row-, and association-
analyses), helping to describe the variability of metabolic datasets under correlation,