Fast text searching: allowing errors
Communications of the ACM
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
A Fast Block-Sorting Algorithm for Lossless Data Compression
DCC '97 Proceedings of the Conference on Data Compression
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Computing Inverse ST in Linear Complexity
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Engineering Radix Sort for Strings
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Indexing Variable Length Substrings for Exact and Approximate Matching
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Backwards Search in Context Bound Text Transformations
CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
The seventeenth australasian document computing symposium
ACM SIGIR Forum
Hi-index | 0.00 |
Approximate pattern matching is an important computational problem with a wide variety of applications in Information Retrieval. Efficient solutions to approximate pattern matching can be applied to natural language keyword queries with spelling mistakes, OCR scanned text incorporated into indexes, language model ranking algorithms based on term proximity, or DNA databases containing sequencing errors. In this paper, we present a novel approach to constructing text indexes capable of efficiently supporting approximate search queries. Our approach relies on a new variant of the Context Bound Burrows-Wheeler Transform (k-bwt), referred to as the Variable Depth Burrows-Wheeler Transform (v-bwt). First, we describe our new algorithm, and show that it is reversible. Next, we show how to use the transform to support efficient text indexing and approximate pattern matching. Lastly, we empirically evaluate the use of the v-bwt for DNA and English text collections, and show a significant improvement in approximate search efficiency over more traditional q-gram based approximate pattern matching algorithms.