375
genome sequence will be
about
to
be finished in
the
near future,
is
closer
related
to
rice
than
it is
to
the
dicot Arabidopsis
7.
Despite
their
econom-
ical importance, trees have a
great
disadvantage for genetic studies due
to
their
naturally
a long life cycle compared
to
cryptogams, making
laboratory
experiments tedious or
not
feasible
8.
Nevertheless,
the
genomic sequence
of
the
western balsam
poplar
(Populus trichocarpa) has been
submitted
to
public
databases
8
and
can
readily
been
analyzed by bioinformatics rou-
tines.
Here, we conduct
an
interspecies
promoter
motif
analysis
to
assess
whether
the
architecture of
the
core-promoters
and
CREs
distribution
therein
is evolutionary conserved. Therefore, we
extracted
upstream
se-
quences from Arabidopsis, rice, poplar
and
Sorghum
annotated
genes, which
contain
the
proximal promoters
and
most essential CREs.
The
variances
of all hexanucleotide motifs
and
known functional
CREs
were
computed
for these four species. For
the
pair-wise visualization of
the
interspecies
differences in
the
promoters, we employed quantile-quantile (QQ)-plots of
these variances.
This
approach disclosed
that
a higher information density
is
contained in
the
promoters of those species
with
more compact genomes.
2.
Material
and
Methods
2.1.
Plant
genome
information
The
genome sequence of
the
chromosome pseudomolecules for
the
plant
model organisms Arabidopsis thaliana, western
balsam
poplar
(Popu-
lus trichocarpa), Sorghum bicolor
and
rice (Oryza sativa) were retrieved
from GenBanks
Plant
Genomes
Central
(http://www
.
ncbi
. nlm.
nih.
gOY
/
genomes/PLANTS/PlantList
.html):
Arabidopsis thaliana
GenBank
ac-
cessions:
NC_003070.9, NC_003071.7, NC_003074.8, NC_003075.7
and
NC_003076.8; Populus trichocarpa
GenBank
accessions: NC_008467.1,
NC_008468.1, NC_008469.1, NC_008470.1, NC_008471.1, NC_008472.1,
NC_008473.1, NC_008474.1, NC_008475.1, NC_008476.1, NC_008477.1,
NC_008478.1, NC_008479.1, NC_008480.1, NC_008481.1, NC_008482.1,
NC_008483.1, NC_008484.1
and
NC_008485.1; Oryza sativa
GenBank
accessions:
NC_008394.1, NC_008395.1, NC_008396.1, NC_008397.1, NC_008398.1,
NC_008399.1, NC_008400.1, NC_008401.1, NC_008402.1, NC_008403.1,
NC_008404.1
and
NC_008405.1.
The
genomic sequences of Sorghum
bi-
color 1 v4 were retrieved from
the
Sorghum bicolor Genome
at
Plant
genome
data
base,
PlantGDB
(http://www
.
plantgdb
.
org/
/SbGDB/).
Promot-