280 Handbook of Chemoinformatics Algorithms
9.5.1 MIXED-INTEGER LINEAR PROGRAMMING ALGORITHM FOR CAMD
Camarda and Sunderesan recently solved an optimization problem related to design-
ing valued-added soybean oil products [17]. They used an MILP approach in
conjunction with order 0, 1, and 2 (simple and valence) connectivity indices [54]. The
key step in their approach was the use of binary variables that allowed them to rewrite
the connectivity indices in terms of these variables. In their formulation, molecules
were represented by a partitioned adjacency matrix that describes the connectivity
of prespecified groups within a molecule (they had chosen 16 groups). The binary
variables, in turn, were related to entries in the partitioned adjacency matrix. This,
in conjunction with a Glover transformation, created objective function expressions
that were linear and, accordingly, easier to solve.
9.5.1.1 Molecular Representation
To represent a molecule, the data structure required is one that identifies whether
a particular group is present within a molecule, what other group(s) it is bonded
to and the multiplicity of that particular bond. What is used is two sets of binary
variables (re: 0 for absence; 1 for presence): W (which determines the presence or
absence of a group) and A (which determines connectivity and bond multiplicity).
Required a priori is a listing of groups to be used and their maximum occurrence
number in a molecule. For example [55], we can examine the binary variables W and
A for the molecule propane. Here, three basic groups are listed with their allowed
multiplicity given parenthetically: CH
3
(3), CH
2
(3), and CH (2). Accordingly, an
adjacency matrix can be written, but in this approach it is partitioned to cluster
the same groups together, resulting in a structure called a partitioned adjacency
matrix (A). We show the partitioned adjacency matrix for propane using the three
basic groups with allowed multiplicity for bond multiplicity 1 in Figure 9.3. Note
that similar matrices will exist for bond multiplicity 2 and 3, yet those will have
zero values for all elements in this instance since propane has no double or triple
bonds.
Since propane has two CH
3
groups, the first one is labeled “1,” while the sec-
ond one is labeled “2.” Such a labeling refers to the row/column in the partitioned
adjacency matrix. Likewise, the single CH
2
group in propane is labeled “6.” Note
that the other group available, CH, is not present in propane and has zeros for all
entries.
Now that the bonding has been accounted for using the partitioned adjacency
matrix, the presence and absence of a group is provided in an existence vector, W.In
this example, W would be the vector: W ={1, 1, 0, 0, 0, 1, 0, 0} where the labeling
mentioned previously is retained here. Once the binary variables A and W are written,
expressions for the molecular connectivity indices (both simple and valence) can be
given in terms of these variables.
9.5.1.2 Constraint Equations
With regard to constraints, molecular feasibility expressions based on the valence of
basic groups were written in terms of the binary variables. Additional expressions