Correctness of substring-preprocessing in Boyer-Moore's pattern matching algorithm

Authors:
Frank Stomp
Affiliations:
Department of Computer Science, Wayne State University, Detroit, MI
Venue:
Theoretical Computer Science
Year:
2003

Citing 6
Cited 1

Computer algorithms: introduction to design and analysis (2nd ed.)

Computer algorithms: introduction to design and analysis (2nd ed.)
Introduction to algorithms

Introduction to algorithms
Algorithms for finding patterns in strings

Handbook of theoretical computer science (vol. A)
The temporal logic of reactive and concurrent systems

The temporal logic of reactive and concurrent systems
A taxonomy of sublinear multiple keyword pattern matching algorithms

Science of Computer Programming
A fast string searching algorithm

Communications of the ACM

A complete mechanization of correctness of a string-preprocessing algorithm

Formal Methods in System Design

Quantified Score

Hi-index	5.23

Visualization

Abstract

One of the main reasons for the high efficiency of the fast pattern matching algorithm of Boyer and Moore is preprocessing. The Boyer-Moore pattern matching algorithm utilizes two preprocessing algorithms: one on single characters and the other one on substrings. It is the latter which makes the pattern matching algorithm extremely fast (especially on natural language text). In the current paper we present a formal correctness proof of the program describing the substring-preprocessing algorithm. The proof is carried out within linear time temporal logic. During the process of our verification we found that indices of auxiliary arrays, as used in published high-level descriptions of the preprocessing algorithm, may run out of bounds. We demonstrate that this is the case and correct that undesirable aspect in the current paper.