384
A
B c
Figure
4.
Distance
matrix
tree
graphs
for
evolutionary
distance
of
the
species
and
Euclidean
distance
of
the
DNA-motifs.
Tree
graphs
display
the
relative
relatedness
of
the
organisms
on
the
basis
of
their
protein
sequence
similarities
(A)
or
the
similarities
between
the
DNA-motif
variances
of
all
hexanucleotides
(B)
and
known
CREs
(C).
Sim-
ilarities
and
dissimilarities
between
the
tree
topologies
in
A - C
are
highlighted
by
lines
that
interconnect
the
four
organisms
between
the
graphs.
different
dataset
sizes
or
DNA-base composition
(GC/AT-content).
Our
presumption
was
that
information
content
within
promoters
is re-
tained
as positional disequilibria
in
promoter
sequences
when
observed
at
a
genomic scale. As a simple,
but
effective measurement, we
took
the
variance
of
DNA-motifs frequency-distribution
to
indicate
the
degree
of
information
a
motif
carries,
and
compared
these
variances between species
to
capture
global similarities in
promoter
architecture. To make conclusions
on
these
interspecies comparisons, we established distance matrices for
protein
se-
quence similarity
and
for
the
variance
of
hexanucleotides
or
CREs;
these
results were displayed as
tree
graphs
(Fig. 4).
The
last
common ancestor
of
dicot (Arabidopsis
and
Populus)
and
monocot (Sorghum
and
Oryza) species
lived
at
approx. 140 Mio years before present
10.
The
phylogenetic
tree
in
Figure
4A
captures
this
evolutionary
time
and
serves as
our
reference.
The
Oryza
dataset
appears
to
be
closer
related
to
Arabidopsis
than
to
its
evolutionarly close relative Sorghum when looking
at
the
hexanucletide
variances.
One possible
explanation
for
this
similarity in hexanucleotide
composition is
that
both
species have relatively small, compressed genome
sizes
11.
As a consequence,
the
intergenic space, i.e.
the
DNA-region be-
tween
the
genes, is relatively small,
and
as such, more information
must
be
packed
into
shorter
promoter
regions
3.
The
Arabidopsis thaliana genome
is one
of
the
smallest eukaryote genome 4
and,
thus,
regulatory
promoter
sequence
must
be
short
and
enriched for motifs
with
a
high
order
of
infor-
mation.
This
notion
might
be
true
for Oryza as well, as
its
genome size is
one
of
the
smallest
amongst
other
Poaceae crop plants.
The
CRE
variance is shown
in
Figure
4C. Here,
the
overall
tree
topology
follows
our
reference
tree
(Fig. 4A),
with
the
exception
of
the
Populus