A Boyer--Moore-style algorithm for regular expression pattern matching

  • Authors:
  • Bruce W. Watson;Richard E. Watson

  • Affiliations:
  • Department of Computer Science, University of Pretoria, Pretoria 0002, South Africa and Department of Computing Science, Eindhoven University of Technology, Eindhoven 5600MB, The Netherlands and F ...;FST Labs & Ribbit Software Systems Inc., Canada

  • Venue:
  • Science of Computer Programming
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a Boyer-Moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new algorithm handles patterns specified by regular expressions-- a generalization of the Boyer-Moore and Commentz-Walter algorithms.Like the Boyer-Moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walter's d1 and d2 shift functions.In certain cases, the Boyer-Moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth-Morris-Pratt (respectively Aho-Corasick) algorithm (as discussed by Watson in his Ph.D. Thesis, Eindhoven University of Technology, September 1995, and in: N. Ziviani, R. Baeza-Yates, K. Guimaraes (Eds.), Proc. Third South American Workshop on String Processing, International Informatics Series, vol. 4, Carleton University Press, Recife, Brazil, 1996, pp. 280-294). In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho-Corasick algorithm.