Efficient seed computation revisited

Authors:
M. Christou;M. Crochemore;C. S. Iliopoulos;M. Kubica;S. P. Pissis;J. Radoszewski;W. Rytter;B. Szreder;T. Wale
Affiliations:
Kings College London, London WC2R 2LS, UK;Kings College London, London WC2R 2LS, UK and Université Paris-Est, France;Kings College London, London WC2R 2LS, UK and Digital Ecosystems & Business Intelligence Institute, Curtin University of Technology, Perth WA 6845, Australia;Department of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland;Kings College London, London WC2R 2LS, UK;Department of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland;Department of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland and Department of Math. and Informatics, Copernicus University, ul. Chopina 12/18, ...;Department of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland;Department of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland and Laboratory of Bioinformatics and Protein Engineering, International Institute o ...
Venue:
Theoretical Computer Science
Year:
2013

Citing 15
Cited 0

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Optimal superprimitivity testing for strings

Information Processing Letters
An on-line string superprimitivity test

Information Processing Letters
Efficient detection of quasiperiodicities in strings

Theoretical Computer Science
The subtree max gap problem with application to parallel string covering

Information and Computation
Computing the covers of a string in linear time

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Of Periods, Quasiperiods, Repetitions and Covers

Structures in Logic and Computer Science, A Selection of Essays in Honor of Andrzej Ehrenfeucht
Finding Maximal Quasiperiodicities in Strings

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A linear-time algorithm for a special case of disjoint set union

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
Algorithms on Strings

Algorithms on Strings
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Efficient seeds computation revisited

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
A new succinct representation of RMQ-information and improvements in the enhanced suffix array

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	5.23

Visualization

Abstract

The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions-computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in O(n^2) time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an O(nlog(n/m)) time algorithm checking if the shortest seed has length at least m and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm Iliopoulos et al. (1996) [14].