Fast text searching: allowing errors
Communications of the ACM
Text algorithms
A subquadratic algorithm for approximate regular expression matching
Journal of Algorithms
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
A technique for computer detection and correction of spelling errors
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Tighter packed bit-parallel NFA for approximate string matching
CIAA'06 Proceedings of the 11th international conference on Implementation and Application of Automata
New bit-parallel indel-distance algorithm
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Hi-index | 0.89 |
We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [R. Baeza-Yates, G. Navarro, Faster approximate string matching, Algorithmica 23 (1999) 127-158]. BPD is one of the most practical approximate string matching algorithms under moderate pattern lengths and error levels [G. Myers, A fast bit-vector algorithm for approximate string matching based on dynamic programming, J. ACM 46 (3) 1989 395-415; G. Navarro, M. Raffinot, Flexible Pattern Matching in Strings-Practical On-line Search Algorithms for Texts and Biological Sequences, Cambridge University Press, Cambridge, UK, 2002]. Given a length-m pattern and an error threshold k, the original BPD requires (m-k)(k+2) bits of space to represent an NFA with (m-k)(k+1) states. In this paper we remove redundancy from the original NFA representation. Our variant requires (m-k)(k+1) bits of space, which is optimal in the sense that exactly one bit per state is used. The space efficiency is achieved by using an alternative, but equally or even more efficient, simulation algorithm for the bit-parallel NFA. We also present experimental results to compare our modified NFA against the original BPD and its main competitors. Our new variant is more efficient than the original BPD, and it hence takes over/extends the role of the original BPD as one of the most practical approximate string matching algorithms under moderate values of k and m.