New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A note on the Burrows-Wheeler transformation
Theoretical Computer Science
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Improved dynamic rank-select entropy-bound structures
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Dynamic extended suffix arrays
Journal of Discrete Algorithms
On suffix extensions in suffix trees
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
On the number of elements to reorder when updating a suffix array
Journal of Discrete Algorithms
Memory-Aware BWT by segmenting sequences to support subsequence search
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
On suffix extensions in suffix trees
Theoretical Computer Science
Lyndon fountains and the Burrows-Wheeler transform
Proceedings of the CUBE International Information Technology Conference
Efficient indexing techniques for record matching and deduplication
International Journal of Computational Vision and Robotics
Hi-index | 5.23 |
We present a four-stage algorithm that updates the Burrows-Wheeler Transform of a text T, when this text is modified. The Burrows-Wheeler Transform is used by many text compression applications and some self-index data structures. It operates by reordering the letters of a text T to obtain a new text bwt(T) which can be better compressed. Even though recent advances are offering this structure new applications, a major bottleneck still exists: bwt(T) has to be entirely reconstructed from scratch whenever T is modified. We study how standard edit operations (insertion, deletion, substitution of a letter or a factor) that transform a text T into T^' impact bwt(T). Then we present an algorithm that directly converts bwt(T) into bwt(T^'). Based on this algorithm, we also sketch a method for converting the suffix array of T into the suffix array of T^'. We finally show, based on the experiments we conducted, that this algorithm, whose worst-case time complexity is O(|T|log|T|(1+log@s/loglog|T|)), performs really well in practice and replaces advantageously the traditional approach.