Bit-Parallel Witnesses and Their Applications to Approximate String Matching

Authors:
Heikki Hyyrö;Gonzalo Navarro
Affiliations:
Department of Computer Sciences, Kanslerinrinne 1, FIN-33014 University of Tampere, Finland;Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile
Venue:
Algorithmica
Year:
2005

Citing 0
Cited 10

Increased bit-parallelism for approximate and multiple string matching

Journal of Experimental Algorithmics (JEA)
On-line Approximate String Matching in Natural Language

Fundamenta Informaticae
High-error approximate dictionary search using estimate hash comparisons

Software—Practice & Experience
Efficient computations of gapped string kernels based on suffix kernel

Neurocomputing
Nested Counters in Bit-Parallel String Matching

LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Average-optimal string matching

Journal of Discrete Algorithms
Approximate all-pairs suffix/prefix overlaps

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Approximate all-pairs suffix/prefix overlaps

Information and Computation
A fast bit-parallel algorithm for gapped string kernels

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
On-line Approximate String Matching in Natural Language

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one, BPM (Myers, 1999), searches for a pattern of length m in a text of length n permitting k differences in $O(\lceil m/w \rceil n)$ time, where w is the width of the computer word. The second one, ABNDM (Navarro and Raffinot, 2000), extends a sublinear-time exact algorithm to approximate searching. ABNDM relies on another algorithm, BPA (Wu and Manber, 1992), which makes use of an $O(k \lceil m/w \rceil n)$ time algorithm for its internal workings. BPA is slow but flexible enough to support all operations required by ABNDM. We improve previous ABNDM analyses, showing that it is average-optimal in number of inspected characters, although the overall complexity is higher because of the $O(k \lceil m/w \rceil )$ work done per inspected character. We then show that the faster BPM can be adapted to support all the operations required by ABNDM. This involves extending it to compute edit distance, to search for any pattern suffix, and to detect in advance the impossibility of a later match. The solution to those challenges is based on the concept of a witness, which permits sampling some dynamic programming matrix values to bound, deduce or compute others fast. The resulting algorithm is average-optimal for m ≤ w, assuming the alphabet size is constant. In practice, it performs better than the original ABNDM and is the fastest algorithm for several combinations of m, k and alphabet sizes that are useful, for example, in natural language searching and computational biology. To show that the concept of witnesses can be used in further scenarios, we also improve a recent variant of BPM. The use of witnesses greatly improves the running time of this algorithm too.