Faster Algorithms for Computing Maximal Multirepeats in Multiple Sequences

Authors:
Costas S. Iliopoulos;W. F. Smyth;Munina Yusufu
Affiliations:
Algorithm Design Group, Department of Computer Science, King's College London The Strand, London WC2R 2LS, England. E-mail: csi@dcs.kcl.ac.uk and Digital Ecosystems & Business Intelligence Institu ...;Algorithms Research Group, Department of Computing & Software, McMaster University Hamilton, Ontario, Canada L8S 4K1. E-mail: smyth@mcmaster.ca and Digital Ecosystems & Business Intelligence Insti ...;Algorithms Research Group, Department of Computing & Software, McMaster University Hamilton, Ontario, Canada L8S 4K1 and Digital Ecosystems & Business Intelligence Institute, Curtin University GPO ...
Venue:
Fundamenta Informaticae - Special Issue on Stringology
Year:
2009

Citing 8
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Locating Maximal Multirepeats in Multiple Strings Under Various Constraints†A preliminary version of the results of this paper was presented in CPM 2002.

The Computer Journal
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
An efficient algorithm for mining string databases under constraints

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Optimal string mining under frequency constraints

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

A repeat in a string is a substring that occurs more than once. A repeat is extendible if every occurrence of the repeat has an identical letter either on the left or on the right; otherwise, it is maximal. A multirepeat is a repeat that occurs at least mmin times (m$_{min}$⩾ 2) in each of at least q ⩾ 1 strings in a given set of strings. In this paper, we describe a family of efficient algorithms based on suffix arrays to compute maximal multirepeats under various constraints. Our algorithms are faster, more flexible and much more space-efficient than algorithms recently proposed for this problem. The results extend recent work by two of the authors computing all maximal repeats in a single string.