Correctness of substring-preprocessing in Boyer-Moore's pattern matching algorithm

  • Authors:
  • Frank Stomp

  • Affiliations:
  • Department of Computer Science, Wayne State University, Detroit, MI

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2003

Quantified Score

Hi-index 5.23

Visualization

Abstract

One of the main reasons for the high efficiency of the fast pattern matching algorithm of Boyer and Moore is preprocessing. The Boyer-Moore pattern matching algorithm utilizes two preprocessing algorithms: one on single characters and the other one on substrings. It is the latter which makes the pattern matching algorithm extremely fast (especially on natural language text). In the current paper we present a formal correctness proof of the program describing the substring-preprocessing algorithm. The proof is carried out within linear time temporal logic. During the process of our verification we found that indices of auxiliary arrays, as used in published high-level descriptions of the preprocessing algorithm, may run out of bounds. We demonstrate that this is the case and correct that undesirable aspect in the current paper.