Searching BWT Compressed Text with the Boyer-Moore Algorithm and Binary Search

Authors:
Tim Bell;Matt Powell;Amar Mukherjee;Don Adjeroh
Affiliations:
-;-;-;-
Venue:
DCC '02 Proceedings of the Data Compression Conference
Year:
2002

Citing 5
Cited 9

A locally adaptive data compression scheme

Communications of the ACM
A fast string searching algorithm

Communications of the ACM
An experimental study of an opportunistic index

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation

ER '98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science

DNA Sequence Compression Using the Burrows-Wheeler Transform

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Locating All Tandem Repeat Families in a Sequence

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Pattern Matching in LZW Compressed Files

IEEE Transactions on Computers
BWT-based efficient shape matching

Proceedings of the 2007 ACM symposium on Applied computing
The SBC-tree: an index for run-length compressed sequences

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Multi-key binary search and the related performance

MATH'08 Proceedings of the American Conference on Applied Mathematics
Dependability Improvement for PPM Compressed Data by Using Compression Pattern Matching

IEICE - Transactions on Information and Systems
Accelerating Boyer-Moore searches on binary texts

Theoretical Computer Science
Accelerating Boyer Moore searches on binary texts

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores two techniques for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (Boyer &Moore 1977) to a transformed string.The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompress-and-search approach for small numbers of queries, and binarysearch is much faster even for large numbers of queries.