RESTRICTION AND HOMING ENDONUCLEASES 363
3. STRUCTURE AND EVOLUTION OF REases AND HEases
Both REases and HEases are heterogeneous from the structural and evolutionary
point of view. All crystal structures of REases solved in the years 1986–2004
revealed the same three-dimensional fold of the PD-(D/E)XK superfamily of
nucleases (review: (Pingoud et al., 2005)). Therefore, it was often assumed that
all members of this group would share a common fold and a similar mechanism
of action. However, most of REases show no evident sequence similarity to each
other or to any other proteins in the database, which makes sequence-based classi-
fication virtually impossible. Even with the availability of state-of-the-art bioin-
formatic tools, the assignment of REases with unknown structures to structural or
evolutionary families remains a challenging task. Thus far, bioinformatic analyses
of REase sequences suggested that although indeed many REases belong to the
PD-(D/E)XK superfamily, others may belong to other unrelated superfamilies,
e.g. phospholipase D (PLD), Me, and GIY-YIG ((Aravind et al., 2000; Bujnicki
et al., 2001; Sapranauskas et al., 2000), review: (Bujnicki, 2001)). These predic-
tions have been recently supported by experimental analyses (unpublished data;
Saravanan et al., 2004). Of particular interest is the recent crystallographic structure
determination of the Mg
2+
-independnet, EDTA-resistant nuclease R.BfiI, a relative
of phospholipase D (Grazulis et al., 2005). It remains to be seen if the so far
unassigned REases may belong to some other, structurally and mechanistically
different protein superfamilies.
Unlike REases, many HEases show readily detectable sequence similarity to
each other. Initially, HEases were thought to belong to four unrelated families:
LAGLIDADG, GIY-YIG, HNH and His-Cys box (reviews: (Belfort and Perlman,
1995; Stoddard, 2005)). However, analysis of experimentally determined structures
revealed a common active site and suggested that HNH and His-Cys box families are
in fact diverged members of the same Me superfamily (Kuhlmann et al., 1999).
Thus far, the catalytic domains of well-characterized REases and HEases were
found to be recruited from five different nuclease superfamilies: LAGLIDADG,
GIY-YIG, Me, PD-(D/E)XK, or PLD (Fig. 3), of which GIY-YIG, Me
are common to both REases and HEases. Interestingly, the results of preliminary
structure predictions for a known cyanobacterial HEase I-SspI suggest it may be a
member of the PD-(D/E)XK superfamily (Bujnicki, unpublished data). On the other
hand, we have recently identified a large family of PD-(D/E)XK-related proteins in
genomes of Cyanobacteria dwelling in fresh waters (Feder and Bujnicki, 2005). In
some genomes the number of copies of these putative nucleases reaches 2% of all
open reading frames; we predict that at least some of them may be still enzymatically
active and engaged in a process similar to intronless homing. This finding suggests
that PD-(D/E)XK may be the third superfamily comprising representatives of both
REases and HEases.
In addition to the catalytic domains, some REases and HEases possess additional
domains, often involved in DNA binding and specific sequence recognition. In
particular HEases often feature multiple additional domains tethered to the catalytic
domain to provide extensive protein surface for the recognition of their extremely