Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching

  • Authors:
  • Heikki Hyyrö

  • Affiliations:
  • Department of Computer Sciences, University of Tampere, Finland

  • Venue:
  • Information Processing Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.89

Visualization

Abstract

We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [R. Baeza-Yates, G. Navarro, Faster approximate string matching, Algorithmica 23 (1999) 127-158]. BPD is one of the most practical approximate string matching algorithms under moderate pattern lengths and error levels [G. Myers, A fast bit-vector algorithm for approximate string matching based on dynamic programming, J. ACM 46 (3) 1989 395-415; G. Navarro, M. Raffinot, Flexible Pattern Matching in Strings-Practical On-line Search Algorithms for Texts and Biological Sequences, Cambridge University Press, Cambridge, UK, 2002]. Given a length-m pattern and an error threshold k, the original BPD requires (m-k)(k+2) bits of space to represent an NFA with (m-k)(k+1) states. In this paper we remove redundancy from the original NFA representation. Our variant requires (m-k)(k+1) bits of space, which is optimal in the sense that exactly one bit per state is used. The space efficiency is achieved by using an alternative, but equally or even more efficient, simulation algorithm for the bit-parallel NFA. We also present experimental results to compare our modified NFA against the original BPD and its main competitors. Our new variant is more efficient than the original BPD, and it hence takes over/extends the role of the original BPD as one of the most practical approximate string matching algorithms under moderate values of k and m.