350 P. Krokhmal et al.
The algorithm for p-norm linear discrimination model 18.9 has been imple-
mented in C++, and ILOG CPLEX 10.0 solver has been used to solve the its LP
approximation as described Sect. 18.3. The approximation accuracy has been set at
10
−5
.
Wisconsin Breast Cancer Data Set This breast cancer database was obtained
from the University of Wisconsin Hospitals by Dr. William H. Wolberg. Each en-
try in the data set is characterized by an ID number and 10 feature values, which
were obtained by medical examination on certain breast tumors. The data set con-
tains a total of 699 data points (records), but because some values are missing, only
682 data points are used in the experiment. The entire data set is comprised of two
classes of data points: 444 (65.1%) data points represent benign tumors, and the rest
of 238 (34.9%) points correspond to malignant cases.
To test the classification performance of the proposed p-norm classification
model, we partitioned the original data set at random into the training and testing
sets in the ratio of 2:1, such that the proportion between benign and malignant cases
would be preserved. In other words, training set contained 2/3 of benign and ma-
lignant points of the entire data set, and the testing set contained the remaining 1/3
of benign and malignant cases. The p-norm discrimination model (i.e., its LP ap-
proximation) was solved using the training set as the data A and B, and the obtained
linear separator was then used to classify the points in the testing set. For each fixed
value of p in (18.9), this procedure has been repeated 10 times, and the average mis-
classification errors rates for benign and malignant cases have been recorded. The
cumulative misclassification rate was then computed as a weighted (0.651 to 0.349)
average of the benign and malignant error rates (note that the weights correspond to
the proportion of benign and malignant points in the entire database).
The value of the parameter p has been varied from p = 1.0top = 5.0 with 0.1
step. Then, an “optimal” value of the parameter p has been selected that delivered
the lowest cumulative average misclassification rates.
In addition to varying the parameter p, we have considered different weights
δ
1
,δ
2
in the p-norm linear separation model (18.9). In particular, the following com-
binations have been used:
δ
1
=k
p
,δ
2
=m
p
,δ
1
=k
p−1
,δ
2
=m
p−1
,δ
1
=1,δ
2
=1.
Finally, the same method was used to compute cumulative misclassification rates for
p =1, or the original model of Bennett and Mangasarian [5]. Table 18.1 displays the
results of our computational experiments. It can be seen that application of higher
norms allows one to reduce the misclassification rates.
Other Data Sets Similar tests have been run also run on Pima Indians Diabetes
data set, Connectionist Bench (Sonar, Mines vs. Rocks) data set, and Ionosphere
data set, all of which can be obtained from UCI Machine Learning Repository. Note
that for these tests only δ
1
=δ
2
=1 and δ
1
=k
p
,δ
2
=m
p
weights in problem (18.9)
are used. Table 18.2 reports the best average error rates obtained under various val-
ues of p for these three data sets, and compares them with the best results known in
the literature for these particular data sets.