Dynamic extended suffix arrays

  • Authors:
  • M. Salson;T. Lecroq;M. Léonard;L. Mouchard

  • Affiliations:
  • Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France;Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France;Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France;Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France and King's College London, Departement of Computer Science, Algorithm Group Design, Strand, London, WC2R 2LS, United Kingdo ...

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The suffix tree data structure has been intensively described, studied and used in the eighties and nineties, its linear-time construction counterbalancing his space-consuming requirements. An equivalent data structure, the suffix array, has been described by Manber and Myers in 1990. This space-economical structure has been neglected during more than a decade, its construction being too slow. Since 2003, several linear-time suffix array construction algorithms have been proposed, and this structure has slowly replaced the suffix tree in many string processing problems. All these constructions are building the suffix array from the text, and any edit operation on the text leads to the construction of a brand new suffix array. In this article, we are presenting an algorithm that modifies the suffix array and the Longest Common Prefix (LCP) array when the text is edited (insertion, substitution or deletion of a letter or a factor). This algorithm is based on a recent four-stage algorithm developed for dynamic Burrows-Wheeler Transforms (BWT). For minimizing the space complexity, we are sampling the Suffix Array, a technique used in BWT-based compressed indexes. We furthermore explain how this technique can be adapted for maintaining a sample of the Extended Suffix Array, containing a sample of the Suffix Array, a sample of the Inverse Suffix Array and the whole LCP array. Our practical experiments show that it operates very well in practice, being quicker than the fastest suffix array construction algorithm.