1.1 Mutation and Alignment 9
algorithm. Segments of this local optimal alignment can be determined inde-
pendently, and a global optimal sequence alignment is obtained by connecting
the segments. Although both approaches are dynamic programming-based al-
gorithms, the Smith–Waterman algorithm greatly simplifies the Needleman–
Wunsch algorithm.
Following the development of the Smith–Waterman algorithm, sequence
alignment became a topic of great importance in bioinformatics. Many papers
were published, which not only improved the Smith–Waterman algorithm,
but also adapted it to apply to protein sequences. As a result, many types of
applied software based on the alignment algorithm were developed, and exist
today as powerful bioinformatics tools. The alignment of protein sequences is
more complex than the alignment of gene sequences because it is much more
difficult to develop scoring matrices (which quantify the similarities between
the sequences) for protein sequences. Researchers have proposed many types of
scoring systems to produce these scoring matrices, such as the PAM system
and the BLOSUM system. For the scoring matrix of the PAM system, the
probability of mutations based on the evolution time of the homologous family
is determined first, followed by the development of the scoring matrix. The
scoring matrix of the BLOSUM system finds the probability of mutations
based on the conservative regions of the homologous family, then develops
a scoring matrix. Therefore, depending on their requirements, users can choose
their scoring matrix based on either the PAM system or the BLOSUM system,
then combine the scoring matrix with a dynamic programming algorithm
to calculate the highest scoring functions. We will go into more detail later
regarding the scoring matrices of the PAM system and the BLOSUM system.
The reader is also referred to the literature [24, 40] for more information on
this topic.
Besides being adapted for use with proteins, there are many other applica-
tions of alignment algorithms. Nowadays, over ten types of software packages
exist for the purpose of database searching. Among them, BLAST and FASTA
are the most popular of those available as free downloads.
A dynamic programming-based algorithm needs to be aligned along both
the vertical axis and the horizontal axis. One must first assign the penalty
scores (or matching scores) at the crossed entries intersected by both the
vertical axis and horizontal axis, and the links optimized. Therefore, the com-
putational complexity of this algorithm can not be less than O(n
2
)(where
n is the length of the aligned sequence). For longer sequences, alignment and
searching are difficult tasks, although they are easily realized using the meth-
ods of computational science. For example, these alignment algorithms cannot
currently be performed on a PC if the length of the sequence exceeds 100 kbp.
For lengths exceeding 10 Mbp, the alignment algorithms cannot be performed
by any computers currently in existence. In 2002, a probability and statistics-
based alignment algorithm was proposed by the Nankai University group,
called the super pairwise alignment algorithm (SPA algorithm for short) [90].
For homologous sequences, the computational complexity of SPA is only O(n),