data:image/s3,"s3://crabby-images/a0a2e/a0a2e1e5460f0cba6feecdeb057e81f8714f8445" alt=""
Dictionary Matching and Indexing (Exact and with Errors) D 243
The implication is that every node u in the original suffix
tree will only appear in log n error trees of the 1-errata trie
because each ancestor v of u is on the path from the root
to u and only log n such nodes are on different centroid
paths than their children (on ). Hence, u appears in only
log
k
n error trees in the k-errata trie. Therefore, the size of
the k-errata trie is O(n log
k
n). Creating the k-errata tries
in O(n log
k+1
n) can be done. To answer queries on a k-er-
rata trie, given the pattern with (at most) k don’t cares, the
0th level of the k-errata trie, i. e. the suffix tree, needs to
be traversed. This is to be done until the first don’t care,
at location j, in the pattern is reached. If at node v in the
0th level of the k-errata trie, enter the (1st level) error tree
hanging off of v and traverse this error tree from location
j + 2 of the pattern (until the next don’t care is met). How-
ever, the error tree hanging off of node v does not contain
the subtree hanging off of v that is along the centroid path.
Hence, continue traversing the pattern in the 0th level of
the k-errata trie, starting along the edge on the centroid
path leaving v (until the next don’t care is met). The search
is done recursively for k don’t cares and, hence, yields an
O(2
k
m)timesearch.
Recall that a solution for indexing text that supports
queries of a pattern with k don’t cares has been de-
scribed. Unfortunately, when indexing to support k mis-
match queries, not to mention k edit operation queries, the
traversal down a k-errata trie can be very time consuming
as frequent branching is required since an error may occur
at any location of the pattern. To circumvent this problem
search many error trees in parallel. In order to do so, the
error trees have to be grouped together. This needs to be
done carefully, see [6] for the full details. Moreover, edit
distance needs even more careful handling. The time and
space of the algorithms achieved in [6] are as follows:
Approximate Text Indexing: The data structure
for mismatches uses space O(n log
k
n), takes time
O(n log
k+1
n) to build, and answers queries in time
O((log
k
n)loglogn + m + occ). For edit distance, the
query time becomes O((log
k
n)loglogn + m +3
k
occ). It
must be pointed out that this result is mostly effective for
constant k.
Approximate Dictionary Matching: For k mis-
matches the data structure uses space O(n + d log
k
d), is
built in time O(n + d log
k+1
d), and has a query time of
O((m + log
k
d) log log n + occ). The bounds for edit dis-
tance are modified as in the indexing problem.
Applications
Approximate Indexing has a wide array of applications
in signal processing, computational biology, and text re-
trieval among others. Approximate Dictionary Matching
is important in digital libraries and text retrieval systems.
Cross References
Compressed Text Indexing
Indexed Approximate String Matching
Multidimensional String Matching
Sequential Multiple String Matching
Suffix Array Construction
Suffix Tree Construction in Hierarchical Memory
Suffix Tree Construction in RAM
Text Indexing
Two-Dimensional Pattern Indexing
Recommended Reading
1. Aho, A.V., Corasick, M.J.: Efficient string matching. Commun.
ACM 18(6), 333–340 (1975)
2. Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic
texts. In: Proc. of Symposium on Discrete Algorithms (SODA),
2000, pp. 819–828
3. Amir, A., Farach, M., Matias, Y.: Efficient randomized dictionary
matching algorithms. In: Proc. of Symposium on Combinatorial
Pattern Matching (CPM), 1992, pp. 259–272
4. Amir, A., Keselman, D., Landau, G.M., Lewenstein, N., Lewen-
stein, M., Rodeh, M.: Indexing and dictionary matching with
one error. In: Proc. of Workshop on Algorithms and Data Struc-
tures (WADS), 1999, pp. 181–192
5. Cole,R.,Kopelowitz,T.,Lewenstein,M.:Suffixtraysandsuffix
trists: Structures for faster text indexing. In: Proc. of Interna-
tional Colloquium on Automata, Languages and Programming
(ICALP), 2006, pp. 358–369
6. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and
indexing with errors and don’t cares. In: Proc. of the Sympo-
sium on Theory of Computing (STOC), 2004, pp. 91–100
7. Farach, M., Muthukrishnan, S.: Optimal parallel dictionary
matching and compression. In: Symposium on Parallel Algo-
rithms and Architecture (SPAA), 1995, pp. 244–253
8. Ferragina, P., Luccio, F.: Dynamic dictionary matching in exter-
nal memory. Inf. Comput. 146(2), 85–99 (1998)
9. Ferragina, P., Muthukrishnan, S., deBerg, M.: Multi-method dis-
patching: a geometric approach with applications to string
matching. In: Proc. of the Symposium on the Theory of Com-
puting (STOC), 1999, pp. 483–491
10. Idury, R.M., Schäffer, A.A.: Dynamic dictionary matching with
failure functions. In: Proc. 3rd Annual Symposium on Combi-
natorial Pattern Matching, 1992, pp. 273–284
11. Karkkainen,J.,Sanders,P.,Burkhardt,S.:Linearworksuffixar-
ray construction. J. ACM 53(6), 918–936 (2006)
12. Mehlhorn, K.: Dynamic binary search. SIAM J. Comput. 8(2),
175–198 (1979)
13. Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic
matching of patterns using a labeling paradigm. In: Proc. of the
Foundations of Computer Science (FOCS), 1996, pp. 320–328
14. Weiner, P.: Linear pattern matching algorithm. In: Proc. of
the Symposium on Switching and Automata Theory, 1973,
pp. 1–11