7.1 Construction of Alignment Space and Its Basic Theorems 221
7.1.2 The Alignment Space Under General Metric
Now we discuss how to measure the problem of generalization errors. There
are many types of metric for measuring generalization errors; e.g., the Lev-
enshtein distance, evolutionary distance, etc. The sequence alignment theory
is very well-established, and its application in bioinformatics is very broad.
Therefore, in this chapter we discuss in detail alignment distance and the cor-
responding alignment space. We also discuss the properties and applications
of the alignment space under general conditions.
General Metric Space
Let V be a finite or infinite set. Let V
+
= V ∪{−}be an expansion of V ,
which includes V and a virtual symbol “−”.
Definition 34. Let V
+
be the expansion of V and let d
+
(a, b) and d(a, b) de-
note the distance functions defined on V
+
and V , respectively. If their distance
functions are consistent, i.e., d(a, b)=d
+
(a, b) holds for all a, b ∈ V ,thenV is
called the topological subspace of V
+
, alternatively, V
+
is called the topological
expansion of V .
The main types of metric expansion space are as follows:
1. The finite set (discrete). In this case, V = V
q
= {0, 1, ···,q−1} is a finite
set, V
+
= V
q+1
= {0, 1, ···,q−1,q}.Asq changes, the finite set has a dif-
ferent meaning. Typically, for q =4,V
q
= {a, c, g, t} or V
q
= {a, c, g, u}
form the familiar nucleotide table; and for q = 20, V
q
is the familiar amino
acid table.
On a finite set V , the distance function d(a, b) is represented by a distance
matrix D =(d(a, b))
a,b∈V
, e.g., the Hamming matrix of (1.6), or the WT-
matrix of (1.7), etc. In this case, {V,D} is a metric space.
2. The bounded infinite set. For example, V =[0,q], q>0 is a bounded
interval, and a continuous set. There are many ways to express the distance
between pairs of elements. For example, the mean square error (MSE)
distance is: d(a, b)=(b − a)
2
, absolute error distance is: d(a, b)=|b − a|,
etc. Endowed with any one distance, {V,D} is a metric space. Using any
one of the above mentioned distance functions, the expansion distance
function defined on the expansion space V
+
can be extended as follows:
d
+
(a, b)=
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
d(a, b) , if both a, b =“−”,
0 , if both a, b =“−”,
q, if only one of a, b is “−”, while d(a, b)isthe
absolute error distance,
q
2
, if only one of a, b is “−”, while d(a, b)isthe
mean square error distance,
(7.1)
then {V
+
,D
+
} is a metric space.