
Suffix Array Construction S 921
practical algorithms is still going on with several different
nontrivial heuristics available [18] including very recent
ones [15].
Experimental Resul t s
An experimental comparison of a large number of suffix
array construction algorithms is presented in [18]. The
best algorithms in the comparison are the algorithm by
Maniscalco and Puglisi [15], which is the fastest but has
an ˝(n
2
) worst-case complexity, and a variant of the al-
gorithm by Burkhardt and Kärkkäinen [2], which is the
fastest among algorithms with good worst-case complex-
ity. Both algorithms are also space efficient. The algorithm
of Manzini and Ferragina [17] is still slightly more space
efficient and also very fast in practice.
There are also experiments with parallel [12]andex-
ternal memory algorithms [3]. Variants of the algorithm
by Kärkkäinen, Sanders and Burkhardt [7]showhighper-
formance and scalability in both cases.
Algorithms for computing the LCP array from the suf-
fix array are compared in [16].
Data Sets
The input to a suffix array construction algorithm is sim-
ply a text, so an abundance of data exists. Commonly used
text collections include the Canterbury Corpus at http://
corpus.canterbury.ac.nz/, the corpus compiled by Manzini
and Ferragina at http://www.mfn.unipmn.it/~manzini/
lightweight/corpus/, and the Pizza&Chili Corpus at http://
pizzachili.dcc.uchile.cl/texts.html.
URL to Code
The implementations of many of the algorithms men-
tioned here are publicly available, for example: http://
www.larsson.dogma.net/research.html [13], http://www.
mpi-sb.mpg.de/~sanders/programs/suffix/ [7], and http://
www.cs.helsinki.fi/juha.karkkainen/publications/cpm03.
tar.gz [2]. Manzini provides a package that computes the
LCP array and the BWT, too, at http://www.mfn.unipmn.
it/~manzini/lightweight/index.html.Thebzip2 com-
pression program (http://www.bzip.org/)computesthe
BWT through suffix array construction.
Cross References
Compressed Suffix Array
Compressed Text Indexing
String Sorting
Suffix Tree Construction in Hierarchical Memory
Suffix Tree Construction in RAM
Text Indexing
Two-Dimensional Pattern Indexing
Recommended Reading
1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees
with enhanced suffix arrays. J. Discret. Algorithms 2, 53–86
(2004)
2. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array con-
struction and checking. In: Proc. 14th Annual Symposium on
Combinatorial Pattern Matching. LNCS, vol. 2676, pp. 55–69.
Springer, Berlin/Heidelberg (2003)
3. Dementiev,R.,Mehnert,J.,Kärkkäinen,J.,Sanders,P.:Betterex-
ternal memory suffix array construction. ACM J. Exp. Algorith-
mics (2008) in press
4. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the
sorting-complexity of suffix tree construction. J. Assoc. Com-
put. Mach. 47, 987–1011 (2000)
5. Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: PAT
trees and PAT arrays. In: Frakes, W.B., Baeza-Yates, R. (eds.) In-
formation Retrieval: Data Structures & Algorithms. pp. 66–82
Prentice-Hall, Englewood Cliffs (1992)
6. Kärkkäinen, J.: Fast BWT in small space by blockwise suffix sort-
ing. Theor. Comput. Sci. 387, 249–257 (2007)
7. Kärkkäinen,J.,Sanders,P.,Burkhardt,S.:Linearworksuffixar-
ray construction. J. Assoc. Comput. Mach. 53, 918–936 (2006)
8. Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of
repeated patterns in strings, trees and arrays. In: Proc. 4th An-
nual ACM Symposium on Theory of Computing, pp. 125–136.
ACM Press, New York (1972)
9.Kasai,T.,Lee,G.,Arimura,H.,Arikawa,S.,Park,K.:Linear-
time longest-common-prefix computation in suffix arrays and
its applications. In: Proc. 12th Annual Symposium on Combi-
natorial Pattern Matching, vol. (2089) of LNCS. pp. 181–192.
Springer, Berlin/Heidelberg (2001)
10. Kim,D.K.,Sim,J.S.,Park,H.,Park,K.:Constructingsuffixarrays
in linear time. J. Discret. Algorithms 3, 126–142 (2005)
11. Ko, P., Aluru, S.: Space efficient linear time construction of suffix
arrays.J.Discret.Algorithms3, 143–156 (2005)
12. Kulla, F., Sanders, P.: Scalable parallel suffix array construction.
In: Proc. 13th European PVM/MPI User’s Group Meeting. LNCS,
vol. 4192, pp. 22–29. Springer, Berlin/Heidelberg (2006)
13. Larsson, N.J., Sadakane, K.: Faster suffix sorting. Theor. Comput.
Sci. 387, 258–272 (2006)
14. Manber, U., Myers, G.: Suffix arrays: A new method for on-line
string searches. SIAM J. Comput. 22, 935–948 (1993)
15. Maniscalco, M.A., Puglisi, S.J.: Faster lightweight suffix ar-
ray construction. In: Proc. 17th Australasian Workshop on
Combinatorial Algorithms, pp. 16–29. Univ. Ballavat, Ballavat
(2006)
16. Manzini, G.: Two space saving tricks for linear time LCP ar-
ray computation. In: Proc. 9th Scandinavian Workshop on
Algorithm Theory. LNCS, vol. 3111, pp. 372–383. Springer,
Berlin/Heidelberg (2004)
17. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array
construction algorithm. Algorithmica 40, 33–50 (2004)
18. Puglisi, S., Smyth, W., Turpin,A.: A taxonomy of suffix array con-
struction algorithms. ACM Comput. Surv. 39(2), Article 4, 31
pages (2007)