On-line construction of compact suffix vectors and maximal repeats

  • Authors:
  • Élise Prieur;Thierry Lecroq

  • Affiliations:
  • University of Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan Cedex, France;University of Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan Cedex, France

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2008

Quantified Score

Hi-index 5.23

Visualization

Abstract

A suffix vector of a string is an index data structure equivalent to a suffix tree. It was first introduced by Monostori et al. in 2001 [K. Monostori, Efficient computational approach to identifying overlapping documents in large digital collections, Ph.D. Thesis, Monash University, 2002; K. Monostori, A. Zaslavsky, H. Schmidt, Suffix vector: Space-and-time-efficient alternative to suffix trees, in: CRPITS'02: Proceedings of the 25th Australasian Computer Science Conference, vol. 4, Darlinghurst, Australia, 2002, Australian Computer Society, Inc. pp 157-166; K. Monostori, A. Zaslavsky, I. Vajk, Suffix vector: A space-efficient suffix tree representation, in: P. Eades, T. Takaoka (Eds.), Proceedings of the 12th International Symposium on Algorithms and Computation, in: Lecture Notes in Computer Science, vol. 2223, Springer-Verlag, Berlin, 2001, pp. 707-718. Christchurch, New Zealand]. They proposed a linear construction algorithm of an extended suffix vector, then another linear algorithm to transform an extended suffix vector into a more space-economical compact suffix vector. We propose an on-line linear algorithm for directly constructing a compact suffix vector. Not only do we show that it is possible to directly build a compact suffix vector but we will also show that this on-line construction can be faster than the construction of the extended suffix vector. Then we formalize the relation between suffix vectors and compact suffix automata which leads to an efficient method for computing maximal repeats using suffix vectors.