270 Handbook of Chemoinformatics Algorithms
9.1 INTRODUCTION
“Molecular design” is a term that has many connotations in a variety of fields. Cer-
tainly one can perform experimental work on various compounds and, through an
analysis of their properties, propose a new substance with a desired property value.
This would be an example of molecular design. However, in this chapter we will
focus on the in silico approach to molecular design, which goes by the catch-all term
“computer-aided molecular design,” or CAMD.
At its most basic level, CAMD is the application of computer-implemented algo-
rithms that are utilized to design a molecule for a particular application. Normally,
when one considers the term “molecular design,” a common thought is in the area
of therapeutics. Many researchers in industry and academia alike are involved in the
design of drugs and, accordingly, extensive effort has been afforded to developing
techniques specific to these types of systems [1–4]. However, while not as visible
or attractive as marketing the latest pharmaceutical, CAMD is a popular and useful
technique in many other areas, such as for polymers [5,6] or in solvent design [7].
In general, CAMD has a practically infinite solution space wherein to search
for candidates. As we shall see in this chapter, when the desired molecules are for
biological systems, the solution space is estimated to be at least 10
60
[8], which is
a relatively tiny fraction of the space as a whole. Large search spaces are both a
blessing and a curse. With a vast amount of compounds to evaluate, there is more of a
possibility to find a higher-quality and/or novel candidate. This could, in turn, lead to
a discovery with the potential for great economic impact for a particular company. On
the other hand, with such a big “hay stack,” enormous time and effort could be spent
in a search that leads nowhere and is unproductive. Accordingly, efforts are made
a priori to limit the search space using techniques such as full-fledged templating [9]
or requiring the presence of certain features in a candidate molecule [10].
The two most visible industries using CAMD are the chemical process industry
and the pharmaceutical industry. While both industries are solving CAMD problems,
they differ in substantial ways. For example, the chemical process industry regularly
uses CAMD in the area of solvent design [7]. Solvents are designed to have certain
properties for applications in a particular area and outputs of CAMD algorithms are
scored based on predicted properties (most often from group-contribution methods).
In the pharmaceutical industry, CAMD is often used in a de novo approach [11,12].
In its most popular implementation, ligands are built within an active receptor site
through a CAMD algorithm, although in reality the term de novo has been used loosely
to encompass virtually any sort of computational drug design [11]. Hence, while both
industries use algorithms that share the common features of computational molecular
design and scoring of candidates, the scoring functions used and (ultimately) the
algorithms employed to make (or revise) the selected candidates are different. For
more details on specific de novo design approaches, many good reviews exist [11,12].
This chapter presents general molecular design methods where compounds are
designed using structure–activity or structure–property relationships. Chapter 10
focuses on drug design and, in particular, de novo drug design. With de novo design,
compounds are constructed ab initio to complement a given target receptor. In con-
trast to the next chapter, the techniques presented here are not limited to drugs