Computing all repeats using suffix arrays

Authors:
Frantisek Franěk;William F. Smyth;Yudong Tang
Affiliations:
Algorithms Research Group, Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada;Algorithms Research Group, Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada and School of Computing, Curtin University, Perth, Australia;Algorithms Research Group, Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada
Venue:
Journal of Automata, Languages and Combinatorics - Special issue: Selected papers of the 13th Australasian workshop on combinatorial algorithms
Year:
2003

Citing 10
Cited 9

An O(n log n) algorithm for finding all repetitions in a string

Journal of Algorithms
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Reducing the space requirement of suffix trees

Software—Practice & Experience
Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Direct Construction of Compact Directed Acyclic Word Graphs

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation

DCC '98 Proceedings of the Conference on Data Compression
On the implementation of compact DAWG's

CIAA'02 Proceedings of the 7th international conference on Implementation and application of automata

Locating All Tandem Repeat Families in a Sequence

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Longest repeats with a block of k don't cares

Theoretical Computer Science
Repetitions in strings: Algorithms and combinatorics

Theoretical Computer Science
Varieties of regularities in weighted sequences

AAIM'10 Proceedings of the 6th international conference on Algorithmic aspects in information and management
New complexity results for the k-covers problem

Information Sciences: an International Journal
Hunting redundancies in strings

DLT'11 Proceedings of the 15th international conference on Developments in language theory
Minimum Unique Substrings and Maximum Repeats

Fundamenta Informaticae - Theory that Counts: To Oscar Ibarra on His 70th Birthday
Computing regularities in strings: A survey

European Journal of Combinatorics
Space-Efficient computation of maximal and supermaximal repeats in genome sequences

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an algorithm that identifies all the repeating substrings (tandem, overlapping, and split) in a given string x = X[1..n]. Given the suffix arrays of x and of the reversed string x, the algorithm requires Θ(n) time for its execution and represents its output in Θ(n) space, either as a reduced suffix array (called an NE array) or as a reduced suffix tree (called an NE tree). The output substrings u are nonextendible (NE); that is, any extension of some occurrence of u in x, either to the left or to the right, yields a string (λu or uλ) that is unequal to the same extension of some other occurrence of u. Thus the number of substrings output is the minimum required to identify all the repeating substrings in x. The output can be used in a straightforward way to identify only repeating substrings that satisfy some proximity or minimum length condition.