Dynamic extended suffix arrays

Authors:
M. Salson;T. Lecroq;M. Léonard;L. Mouchard
Affiliations:
Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France;Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France;Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France;Université de Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan, France and King's College London, Departement of Computer Science, Algorithm Group Design, Strand, London, WC2R 2LS, United Kingdo ...
Venue:
Journal of Discrete Algorithms
Year:
2010

Citing 21
Cited 9

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Optimal on-line search and sublinear time update in string matching

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Compact suffix array: a space-efficient full-text index

Fundamenta Informaticae - Special issue on computing patterns in strings
Compressed Index for Dynamic Text

DCC '04 Proceedings of the Conference on Data Compression
Succinct suffix arrays based on run-length encoding

Nordic Journal of Computing
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Linear work suffix array construction

Journal of the ACM (JACM)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Rank and select revisited and extended

Theoretical Computer Science
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
A four-stage algorithm for updating a Burrows-Wheeler transform

Theoretical Computer Science
Succinct dynamic dictionaries and trees

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Fully-compressed suffix trees

LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Improved dynamic rank-select entropy-bound structures

LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics

Compressed Suffix Arrays for Massive Data

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Stream-based translation models for statistical machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Position heaps: A simple and dynamic text indexing data structure

Journal of Discrete Algorithms
Complex Event Detection in Extremely Resource-Constrained Wireless Sensor Networks

Mobile Networks and Applications
On suffix extensions in suffix trees

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
On the number of elements to reorder when updating a suffix array

Journal of Discrete Algorithms
On-line suffix tree construction with reduced branching

Journal of Discrete Algorithms
On suffix extensions in suffix trees

Theoretical Computer Science
Efficient indexing techniques for record matching and deduplication

International Journal of Computational Vision and Robotics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The suffix tree data structure has been intensively described, studied and used in the eighties and nineties, its linear-time construction counterbalancing his space-consuming requirements. An equivalent data structure, the suffix array, has been described by Manber and Myers in 1990. This space-economical structure has been neglected during more than a decade, its construction being too slow. Since 2003, several linear-time suffix array construction algorithms have been proposed, and this structure has slowly replaced the suffix tree in many string processing problems. All these constructions are building the suffix array from the text, and any edit operation on the text leads to the construction of a brand new suffix array. In this article, we are presenting an algorithm that modifies the suffix array and the Longest Common Prefix (LCP) array when the text is edited (insertion, substitution or deletion of a letter or a factor). This algorithm is based on a recent four-stage algorithm developed for dynamic Burrows-Wheeler Transforms (BWT). For minimizing the space complexity, we are sampling the Suffix Array, a technique used in BWT-based compressed indexes. We furthermore explain how this technique can be adapted for maintaining a sample of the Extended Suffix Array, containing a sample of the Suffix Array, a sample of the Inverse Suffix Array and the whole LCP array. Our practical experiments show that it operates very well in practice, being quicker than the fastest suffix array construction algorithm.