A locally adaptive data compression scheme
Communications of the ACM
A fast string searching algorithm
Communications of the ACM
An experimental study of an opportunistic index
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
ER '98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
DNA Sequence Compression Using the Burrows-Wheeler Transform
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Locating All Tandem Repeat Families in a Sequence
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Pattern Matching in LZW Compressed Files
IEEE Transactions on Computers
BWT-based efficient shape matching
Proceedings of the 2007 ACM symposium on Applied computing
The SBC-tree: an index for run-length compressed sequences
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Multi-key binary search and the related performance
MATH'08 Proceedings of the American Conference on Applied Mathematics
Dependability Improvement for PPM Compressed Data by Using Compression Pattern Matching
IEICE - Transactions on Information and Systems
Accelerating Boyer-Moore searches on binary texts
Theoretical Computer Science
Accelerating Boyer Moore searches on binary texts
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Hi-index | 0.00 |
This paper explores two techniques for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (Boyer &Moore 1977) to a transformed string.The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompress-and-search approach for small numbers of queries, and binarysearch is much faster even for large numbers of queries.