Predicting string search speed

Authors:
Arthur Gittleman
Affiliations:
-
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
1996

Citing 3
Cited 2

A very fast substring search algorithm

Communications of the ACM
Analysis of Boyer-Moore-type string searching algorithms

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm

Communications of the ACM

A new string-pattern matching algorithm using partitioning and hashing efficiently

Journal of Experimental Algorithmics (JEA)
Improving Boyer-Moore-Horspool using machine-words for comparison

Proceedings of the 48th Annual Southeast Regional Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

String search is fundamental in many text processing applications. Sunday recently gave several algorithms to find the first occurrence of a pattern string as a substring of a text, providing experimental data from searches in a text of about 200K characters to support his claim that his algorithms are faster than the standard Boyer-Moore algorithm. We present a methodology for the average-case analysis of the performance of string search algorithms---for such algorithms, a worst-case analysis does not yield much useful information, since the performance of the algorithm is directly affected by such characteristics as the size of the character set, the character frequencies, and the structure of the text. Knuth described a finite automaton which can be used to save information about character comparisons. Baeza-Yates, Gonnet, and Regnier gave a probabilistic analysis of the worst- and average-case behavior of a string search algorithm based upon such an automaton. We construct Knuth automata to model Sunday's algorithms and use the methods of Baeza-Yates et al. to obtain an average-case analysis which confirms Sunday's experimental data.