data:image/s3,"s3://crabby-images/6455b/6455b849896a07b3fe4dbb65a5eaca3032906cc3" alt=""
638 P Parameterized Matching
ancestor information (which can be computed once in lin-
ear time [11]), one can find the longest common prefix
of the pattern and the corresponding suffix (in constant
time). There must be a mismatch immediately afterwards.
Thealgorithmjumpsoverthemismatchandrepeatsthe
process, taking into consideration the offsets of the pattern
and suffix.
When attempting to apply this technique to a param-
eterized suffix tree, it fails. To illustrate this, consider the
first matching substring (up until the first error) and the
next matching substring (after the error). Both of these
substrings p-match the substring of the text that they are
aligned with. However, it is possible that combined they
do not form a p-match. See the example below. In the ex-
ample abab p-matches cdcd followed by a mismatch and
subsequently followed by abaa p-matching efee. However,
different ’s are required for the local p-matches. This ex-
ample also emphasizes why the definition of [8]isasim-
plification. Specifically, each local p-matching substring is
one replacement, i. e., abab with cdcd is one replacement
and abaa with efee is one more replacement. However, the
definition of [3] captures the globality of the parameter-
ized matching, not allowing, in this case, abab to p-match
to two different substrings.
p = ababaabaa
t = cdcddef ee
In [12]theproblemofparameterized matching with
kmismatcheswas considered. The parameterized match-
ing problem with k mismatches seeks all locations i
in text t where the minimal -mismatch between p to
t
i
t
i+m1
has at most k mismatches. An O(nk
1:5
+
mk log m) time algorithm was presented in [12]. At the
base of the algorithm, i. e., for the case where jpj = jtj = m,
an O(m + k
1:5
) algorithm is used based on maximum
matching algorithms. Then the algorithm uses a doubling
scheme to handle the growing distance between potential
parameterized matches (with at most k mismatches). Also
shownin[12] is a strong relationship between maximum
matching algorithms in sparse graphs and parameterized
matching with k errors.
The rigid, but more realistic, definition for the Ham-
ming distance version given in [3] can be naturally ex-
tended to the edit distance. Lately, it was shown that
this problem is nondeterministic polynomial-time com-
plete [15].
Applications
Parameterized matching has applications in code duplica-
tion detection in programming languages, in homework
plagiarism detection, and in image processing, among oth-
ers [1,4].
Cross References
Multidimensional String Matching
Sequential Approximate String Matching
Sequential Exact String Matching
Suffix Tree Construction in Hierarchical Memory
Suffix Tree Construction in RAM
Recommended Reading
1. Amir, A., Aumann, Y., Cole, R., Lewenstein, M., Porat, E.: Func-
tion matching: Algorithms, applications and a lower bound. In:
Proc. of the 30th International Colloquium on Automata, Lan-
guages and Programming (ICALP), 2003 pp. 929–942
2. Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence
in parameterized matching. Inf. Process. Lett. 49, 111–115
(1994)
3. Apostolico, A., Erdös, P., Lewenstein, M.: Parameterized match-
ing with mismatches. J. Discret. Algorithms 5(1), 135–140
(2007)
4. Baker, B.S.: A theory of parameterized pattern matching: algo-
rithms and applications. In: Proc. 25th Annual ACM Symposium
on the Theory of Computation (STOC), 1993, pp. 71–80
5. Baker, B.S.: Parameterized pattern matching by Boyer-Moore-
type algorithms. In: Proc. 6th Annual ACM-SIAM Symposium
on Discrete Algorithms (SODA), 1995, pp. 541–550
6. Baker, B.S.: Parameterized pattern matching: Algorithms and
applications. J. Comput. Syst. Sci. 52(1), 28–42 (1996)
7. Baker, B.S.: Parameterized duplication in strings: Algorithms
and an application to software maintenance. SIAM J. Comput.
26(5), 1343–1362 (1997)
8. Baker, B.S.: Parameterized diff. In: Proc. 10th Annual ACM-SIAM
Symposium on Discrete Algorithms (SODA), 1999, pp. 854–855
9. Cole, R., Hariharan, R.: Faster suffix tree construction with miss-
ing suffix links. In: Proc. 32nd ACM Symposium on Theory of
Computing (STOC), 2000 pp. 407–415
10. Fredriksson, K., Mozgovoy, M.: Efficient parameterized string
matching. Inf. Process. Lett. 100(3), 91–96 (2006)
11. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest com-
mon ancestor. J. Comput. Syst. Sci. 13, 338–355 (1984)
12. Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameter-
ized matching. ACM Trans. Algorithms 3(3) (2007)
13. Hazay, C., Lewenstein, M., Tsur, D.: Two dimensional parame-
terized matching. In: Proc. of 16th Symposium on Combinato-
rial Pattern Matching (CPM), 2005, pp. 266–279
14. Idury, R.M., Schäffer, A.A.: Multiple matching of parametrized
patterns. Theor. Comput. Sci. 154(2), 203–224 (1996)
15. Keller, O., Kopelowitz, T., Lewenstein, M.: Parameterized LCS
and edit distance are NP-Complete. Manuscript
16. Kosaraju, S.R.: Faster algorithms for the construction of pa-
rameterized suffix trees. In: Proc. 36th Annual Symposium on
Foundations of Computer Science (FOCS), 1995, pp. 631–637
17. Landau, G.M., Vishkin, U.: Fast string matching with k differ-
ences. J. Comput. Syst. Sci. 37(1), 63–78 (1988)
18. McCreight, E.M.: A space-economical suffix tree construction
algorithm. J. ACM 23, 262–272 (1976)