Computing longest common substrings via suffix arrays

Authors:
Maxim A. Babenko;Tatiana A. Starikovskaya
Affiliations:
Moscow State University;Moscow State University
Venue:
CSR'08 Proceedings of the 3rd international conference on Computer science: theory and applications
Year:
2008

Citing 6
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set of N strings A = {α1, ..., αN} of total length n over alphabet Σ one may ask to find, for each 2 ≤ K ≤ N, the longest substring β that appears in at least K strings in A. It is known that this problem can be solved in O(n) time with the help of suffix trees. However, the resulting algorithm is rather complicated (in particular, it involves answering certain least common ancestor queries in O(1) time). Also, its running time and memory consumption may depend on |Σ|. This paper presents an alternative, remarkably simple approach to the above problem, which relies on the notion of suffix arrays. Once the suffix array of some auxiliary O(n)-length string is computed, one needs a simple O(n)-time postprocessing to find the requested longest substring. Since a number of efficient and simple linear-time algorithms for constructing suffix arrays has been recently developed (with constant not depending on |Σ|), our approach seems to be quite practical.