128 Handbook of Chemoinformatics Algorithms
Joint-occupancy J
0
: The joint-occupancy counts the number of IPEs in a specific
cell that occur in this cell as well for the actual compound and for the reference
molecule. Therefore, it is a kind of similarity measure to a reference molecule
that regards the conformational space of both compounds.
Self-occupancy S
0
: The self-occupancy is calculated as the difference between the
absolute-occupancy and the joint-occupancy.
The resulting 4D features can be used as descriptors in QSAR modeling. Similar
to CoMFA [63], the large number of feature favors machine learning methods that are
capable of dealing with large feature spaces as it is the case for partial least squares.
All approaches that have been presented so far describe the ligand compound
in increasing complexity reaching its limit in the consideration of conformational
ensembles in the 4D QSAR paradigm. The 5D QSAR idea [85,86] goes beyond
that by incorporating information about the receptor structure and even its flexibility
regarding induced fit effects. This receptor-dependent QSAR (RD-QSAR) concept
does not necessarily need real information about the ligands target. The construction
of receptor envelopes as it is proposed in the work on 5D QSAR of Vedani and Dobler
[85,86] uses only the set of conformational ensembles of the ligands to infer a model of
the hypothetical receptor binding side. This is done using the concept of a hypothetical
receptor surface model originally published by Hahn [82,83]. The receptor surface
model is extended in this approach to incorporate induced fit effects. A ligand-specific
induced fit surface, called the “inner envelope,” is calculated for each molecule by
mapping the receptor surface that has been computed using all ligands onto the van
der Waals surface of the single molecule. The magnitude of the deformation measured
as the RMSD of corresponding surface points can be used to calculate a hypothetical
“induced fit” energy of this molecule. This energy is combined with other force field
energy terms into an equation that describes the binding energy of this molecule to
the hypothetical receptor.
Therefore, the inferred surface can be regarded as QSAR equation. The equation
is trained by a genetic algorithm that varies the surface properties, which have been
randomly assigned, in order to optimize the fit of the models energy equation to the
target values. Thus, the 5D QSAR approach is different from most of the previously
represented ideas because of its different understanding of descriptors. The surface
properties are varied in order to learn the model. Therefore, they can be regarded as
coefficients rather than features. If considered as descriptors, the interaction potentials
towards the ligand atoms are regarded as the values of the surface points. Thus, the
approach is to some extent the learning of a receptor binding pocket.
4.6 IMPLICIT AND PAIRWISE GRAPH ENCODING: MCS MINING
AND GRAPH KERNELS
4.6.1 MCS M
INING
4.6.1.1 Maximum Common Subgraph
A maximum common subgraph (MCS) is the result of a search for maximum isomor-
phic pairs (S, S
), such that S is subgraph of G and S
a subgraph of G
. From a formal