Efficient Experimental String Matching by Weak Factor Recognition

  • Authors:
  • Cyril Allauzen;Maxime Crochemore;Mathieu Raffinot

  • Affiliations:
  • -;-;-

  • Venue:
  • CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new notion of weak factor recognition that is the foundation of new data structures and on-line string matching algorithms. We define a new automaton built on a string p = p1p2 ... pm that acts like an oracle on the set of factors pi ... pj. If a string is recognized by this automaton, it may be a factor of p. But, if it is rejected, it is surely not a factor. We call it factor oracle. More precisely, this automaton is acyclic, recognizes at least the factors of p, has m+ 1 states and a linear number of transitions. We give a very simple sequential construction algorithm to build it. Using this automaton, we design an efficient experimental on-line string matching algorithm (we conjecture its optimality in regard to the experimental results) that is really simple to implement. We also extend the factor oracle to predict that a string could be a suffix (i.e. in the set pi ... pm) of p. We obtain the suffix oracle, that enables in some cases a tricky improvement of the previous string matching algorithm.