Molecular Descriptors 91
detailed account on the most important types, principles, and algorithms. An encyclo-
pedia that covers most of the important molecular descriptors can be found in Ref. [1].
A molecular descriptor is an abstract, in most cases numerical, property of a molec-
ular structure, derived by some algorithm describing a specific aspect of a compound.
There are many ways to define descriptor classes. The most important object is to
differentiate between the structural representations used as input. The simplest types
are one-dimensional descriptors (0D and 1D) that only depend on the molecular for-
mula, such as molecular mass or the numbers of specific elements. The net charge of
a molecule is often regarded as a 1D descriptor. Most descriptors consider the molec-
ular topology (i.e., the structural formula). These are considered as two-dimensional
(2D) descriptors like most of the graph theory-based descriptors. Descriptors that also
regard the spatial structure are defined as three-dimensional (3D). This class consists,
for instance, of molecular interaction field (MIF)-based approaches, but also methods
that make use of Euclidean distances. Further descriptor classes that have been intro-
duced consider, for example, different conformations or molecular dynamics. Their
dimensionality cannot be expressed in a similar intuitive way; sometimes we can find
acronyms like four-dimensional (4D) or five-dimensional (5D) for such methods.
Most of the descriptors we will present in this chapter are at least 2D and therefore
make use of the molecular topology. In such approaches, a molecule is often regarded
as a graph annotated with complex properties, often using an unrestricted label alpha-
bet. This flexible definition allows us to apply all kinds of structured data algorithms
based on graphs [2], which also covers feature-reduced molecular graphs.
Definition4.1: Given a node label alphabet L
and an edge label alphabet L
define a directed attributed graph g by the four-tuple g = (V, E, μ, ν), where
• V defines a finite set of nodes
• E ⊆ VxV denotes a set of edges
• μ : V → L
denotes a node labeling function
• ν : E → L
denotes an edge labeling function
The set V of nodes can be regarded as a set of node attributes of size
The set E of edges defines the structure and (edge) density of the graph. A con-
nection from node v ∈ V to node u ∈ V is formed by e = (u, v),ife ∈ E. A labeling
function allows integrating information on the nodes or edges by using L
and L
In theory, there is no restriction to the label alphabet. Nevertheless, for practical rea-
sons the label alphabet is restricted to a vector space of a limited dimension L = R
or a discrete set of symbols, L ={s
, ..., s
}. Other definitions of labels might also
contain information such as strings, trees, or graphs, as an alphabet reduction may
impose constraints on the application domain, allowing a more flexible encoding.
Although there are various labeling functions for molecular graphs possible, there
are still ongoing discussions for a standard definition (http://blueobelisk.sourceforge
.net, Due to differences in chemoinformatics perception