6 Searching Texts – But Fast! 49
1 2 3 4 5 6 7 8 ...
H a y s t a c k
j=m=4
d a y s
↓ j:=j−1
H a y s t a c k
j=3
d a y s
↓ j:=j−1
H
a y s t a c k
j=2
d a y s
↓ j:=j−1
H
a y s t a c k
| j=1
d
a y s
The figure to the left clarifies the
function of this little program when
searching text t from above for w =
days. Here, a green double arrow
represents a comparison of two iden-
tical symbols; a read bar connects
two symbols for which a mismatch
has been observed. Beginning with
w[4], the text is compared symbol by
symbol to w until either j becomes 0
(which is not the case in our exam-
ple, and would imply an occurrence
of w as part of the text) or the sym-
bols w[j]andt[j] just compared do
not match (which happens for j =1
in our example). These two condi-
tions are checked within the while-
loop of the program.
In general,
while (j>0) and (w[j]=t[j])doj := j − 1;
means that j is decreased by 1 as long as it is larger than 0 and the jth symbol
of the text is equal to the jth symbol of the word. Thus, in cases where the
first m symbols of the text do not match w, the second condition eventually
gets violated, leaving a value of j larger than 0. As a consequence, in line 4
of our program the command if (j =0) ... will not report an occurrence
(printing the text “Occurrence at position 1” is only a surrogate for any action
to be taken in case of an occurrence of w). If, on the contrary, all m symbols of
w match the first m symbols of t, then the while-loop terminates since j =0
holds. In this case our program reports success.
Since we have to find all occurrences of w as a substring of t,weobvi-
ously have to search other locations of t than just the beginning. In fact, w
may start at any position of t which has to be checked by our program. In
this
context, an
y position of t means that we have to expect an occurrence
of w at the second, third, . . . positions of t also. For the second position
we must decide if w[1] = t[2] and w[2] = t[3] and ... and w[m]=t[m +1]
hold. The third, fourth, . . . positions have to be examined analogously; the
(n − m + 1)th position is the last to be considered where w[m]andt[n]are
aligned. Considering position pos,wehavetocomparew[1] and t[pos], w[2] and
t[pos +1],...,w[m]andt[pos + m −1] (which our algorithm will do in reverse
order). By introducing an additional variable pos we can easily extend our
program to (according to our preliminary considerations) search for w at any
position of t (parts of the program adopted from above are printed in blue).