New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
On the Performance of BWT Sorting Algorithms
DCC '00 Proceedings of the Conference on Data Compression
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Compressed indexing and local alignment of DNA
Bioinformatics
A four-stage algorithm for updating a Burrows-Wheeler transform
Theoretical Computer Science
Dynamic extended suffix arrays
Journal of Discrete Algorithms
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Computing the longest common prefix array based on the Burrows-Wheeler transform
Journal of Discrete Algorithms
Hi-index | 0.00 |
Recently new algorithms appeared for updating the Burrows-Wheeler Transform or the suffix array, when the text they index is modified. These algorithms proceed by reordering entries and the number of such reordered entries may be as high as the length of the text. However, in practice, these algorithms are faster for updating the Burrows-Wheeler Transform or the suffix array than the fastest reconstruction algorithms. In this article we focus on the number of elements to be reordered for real-life texts. We show that this number is related to LCP values and that, on average, L"a"v"e entries are reordered, where L"a"v"e denotes the average LCP value, defined as the average length of the longest common prefix between two consecutive sorted suffixes. Since we know little about the LCP distribution for real-life texts, we conduct experiments on a corpus that consists of DNA sequences and natural language texts. The results show that apart from texts containing large repetitions, the average LCP value is close to the one expected on a random text.