100 Handbook of Chemoinformatics Algorithms
An alternative method is to define an algorithm that generates a set of substructures
for a molecule. This can be regarded as the generation of a set of descriptors for a
specificmoleculeby an algorithm. This avoids an explicit definition of the substructure
set and thus allows us to reveal important but yet unrecognized structural features. In
this case, it is necessary to introduce a metric that allows a quantitative comparison
of the resulting descriptor sets with variable cardinality for different molecules.
4.4.1 SUBSTRUCTURE TYPES AND GENERATION
4.4.1.1 Atom Types and Reduced Graphs
The fundamental building blocks of molecular graphs are atoms (the smallest sub-
structures that fulfill the upper definition).An atom is usually considered as an instance
of a specific chemical element type defining its physical properties (e.g., expected
mass, electronegativity, and number of electrons and protons). The chemical proper-
ties are expressed only vaguely by the element alone, because most properties related
to atomic interactions depend on the hybridization and the neighborhood of an atom.
Therefore, it is common to use a finer distinction of atoms of the same element leading
to the concept of an atom type.
In this chapter, we will consider an atom type as a structural pattern that denotes
which configurations of an atom (including specific properties like charge, hybridiza-
tion, isotope, etc.) and its intramolecular neighborhood can be considered as equal.
This concept is of special importance in the application of empirical force fields in
which the potential terms are evaluated using deviations of precalculated ab initio
or experimental parameters for specific atom types (e.g., the optimal bond length
between two sp
3
carbons) that are considered favorable.
The definition of a dictionary of atom types is a crucial step in many applications
of chemoinformatics and is a major contribution of chemical expert knowledge in a
computational framework.
There are many atom-type dictionaries of different accuracies available. Some
popular definitions are SYBYL atom types [24], which differentiate mainly regarding
hybridizations and element types, the Meng/Lewis definition [25], or the MacroModel
atom types [26], which extend the definition to specific atoms in substructures like
ring systems or amino acids using SMARTS patterns [27].
Besides the incorporation of expert knowledge into a chemoinformatics frame-
work, atom types can be powerful features if they are regarded as binary descriptors.
This is the base of many structural features that have to deal with the problem whether
two atoms can be considered equal. For example, this plays an important role in the
computation of the cardinality of the junction of atom-type sets needed by many
similarity measures for molecular fingerprints or in the definition of pharmacophoric
points.
An extension of the atom-type concept is the definition of substructure types (e.g.,
using SMARTS expressions) by regarding whole substructures like rings or functional
groups as atom types, whose properties reflect the properties of the substructure. This
representation is useful in molecular similarity calculations like Feature Trees [28]
to ensure that these substructures are only compared in bulk. Another advantage of