Predicting string search speed

  • Authors:
  • Arthur Gittleman

  • Affiliations:
  • -

  • Venue:
  • Journal of Experimental Algorithmics (JEA)
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

String search is fundamental in many text processing applications. Sunday recently gave several algorithms to find the first occurrence of a pattern string as a substring of a text, providing experimental data from searches in a text of about 200K characters to support his claim that his algorithms are faster than the standard Boyer-Moore algorithm. We present a methodology for the average-case analysis of the performance of string search algorithms---for such algorithms, a worst-case analysis does not yield much useful information, since the performance of the algorithm is directly affected by such characteristics as the size of the character set, the character frequencies, and the structure of the text. Knuth described a finite automaton which can be used to save information about character comparisons. Baeza-Yates, Gonnet, and Regnier gave a probabilistic analysis of the worst- and average-case behavior of a string search algorithm based upon such an automaton. We construct Knuth automata to model Sunday's algorithms and use the methods of Baeza-Yates et al. to obtain an average-case analysis which confirms Sunday's experimental data.