Space-Efficient computation of maximal and supermaximal repeats in genome sequences

Authors:
Timo Beller;Katharina Berger;Enno Ohlebusch
Affiliations:
Institute of Theoretical Computer Science, University of Ulm, Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, Ulm, Germany
Venue:
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Year:
2012

Citing 15
Cited 1

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
On maximal repeats in strings

Information Processing Letters
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Computing all repeats using suffix arrays

Journal of Automata, Languages and Combinatorics - Special issue: Selected papers of the 13th Australasian workshop on combinatorial algorithms
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
On-line construction of compact suffix vectors and maximal repeats

Theoretical Computer Science
Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome

Bioinformatics
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Top-k ranked document search in general text databases

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Computing the longest common prefix array based on the burrows-wheeler transform

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient computation of substring equivalence classes with suffix arrays

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

RACE: a scalable and elastic parallel system for discovering repeats in very long sequences

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and the notions of maximal and supermaximal repeats capture all exact repeats in a genome in a compact way. Very recently, Külekci et al. (Computational Biology and Bioinformatics, 2012) developed an algorithm for finding all maximal repeats that is very space-efficient because it uses the Burrows-Wheeler transform and wavelet trees. In this paper, we present a new space-efficient algorithm for finding maximal repeats in massive data that outperforms their algorithm both in theory and practice. The algorithm is not confined to this task, it can also be used to find all supermaximal repeats or to solve other problems space-efficiently.