Space-Efficient computation of maximal and supermaximal repeats in genome sequences

  • Authors:
  • Timo Beller;Katharina Berger;Enno Ohlebusch

  • Affiliations:
  • Institute of Theoretical Computer Science, University of Ulm, Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, Ulm, Germany

  • Venue:
  • SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and the notions of maximal and supermaximal repeats capture all exact repeats in a genome in a compact way. Very recently, Külekci et al. (Computational Biology and Bioinformatics, 2012) developed an algorithm for finding all maximal repeats that is very space-efficient because it uses the Burrows-Wheeler transform and wavelet trees. In this paper, we present a new space-efficient algorithm for finding maximal repeats in massive data that outperforms their algorithm both in theory and practice. The algorithm is not confined to this task, it can also be used to find all supermaximal repeats or to solve other problems space-efficiently.