Linear time algorithms for finding and representing all the tandem repeats in a string

  • Authors:
  • Dan Gusfield;Jens Stoye

  • Affiliations:
  • Department of Computer Science, University of California-Davis, Davis, CA;Universität Bielefeld, Technische Fakultät, 33594 Bielefeld, Germany and Department of Computer Science, University of California-Davis, Davis, CA

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

A tandem repeat (or square) is a string αα, where α is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats.