String pattern matching in polynomial time

  • Authors:
  • K. C. Liu;A. C. Fleck

  • Affiliations:
  • University of Wisconsin---Milwaukee;The University of Iowa

  • Venue:
  • POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
  • Year:
  • 1979

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a wide range of applications for string processing and SNOBOL4 (Griswold, et al. [1971]) has come to be the most widely implemented and accepted language for such applications. No doubt one of the principle reasons for this acceptance is the data structure around which the language is organized, the string pattern. This structure together with the associated pattern matching process provide great flexibility. Nevertheless it has been widely recognized in informal terms that the pattern matching process is often grossly inefficient (Ripley & Griswold [1975], Dewar & McCann [1977]) and that the pattern structure is notoriously difficult to explain and use (Ripley & Griswold [1975], Stewart [1975]). Each of these areas of difficulty relates to such things as two modes of operation (quick-full scan), problems with left-recursion, heuristics in the scan, etc. Some difficulties are inherent with string patterns but many are not; we feel the developments described here help to clarify this situation.In section 2 we describe the formal model upon which we base this work. This allows the careful analysis of the variety of sets of strings which may be specified by the patterns which we admit and deduction to be made concerning the possibility/impossibility of algorithms of interest. With SNOBOL4 it has been the case that the careful definition of the "meaning" of a pattern is in terms of the actions taken by the pattern matching algorithm. This has led to the incorporation of idiosyncrasies of a particular algorithm into the understanding of the pattern structure. This seems akin to using a compiler as the definition of a programming language and we believe it is important to future progress to have other alternatives.In section 3 we point out that the worst-case execution time of the usual SNOBOL pattern matching algorithm is exponential in the length of the subject string, even on some quite simple patterns. We then present an algorithm whose worst-case time is polynomial and that operates on patterns which include a true set complement operator. As side benefits we find that the algorithm is not multi-modal and correctly handles the null string as an alternative and left-recursion.In order to conserve space we will assume throughout this paper that the reader is familiar with the idea of a string pattern in the sense that it is described in Griswold et al. [1971]. Also it is probably necessary that the reader have some general knowledge of the formal languages area.