An efficient linear pseudo-minimization algorithm for aho-corasick automata

  • Authors:
  • Omar AitMous;Frédérique Bassino;Cyril Nicaud

  • Affiliations:
  • LIPN, UMR 7030, Université Paris 13 - CNRS, Villetaneuse, France;LIPN, UMR 7030, Université Paris 13 - CNRS, Villetaneuse, France;LIGM, UMR 8049, Université Paris-Est - CNRS, Marne-la-Vallée, France

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A classical construction of Aho and Corasick solves the pattern matching problem for a finite set of words X in linear time, where the size of the input X is the sum of the lengths of its elements. It produces an automaton that recognizes A*X, where A is a finite alphabet, but which is generally not minimal. As an alternative to classical minimization algorithms, which yields a ${\mathcal O}(n\log n)$ solution to the problem, we propose a linear pseudo-minimization algorithm specific to Aho-Corasick automata, which produces an automaton whose size is between the size of the input automaton and the one of its associated minimal automaton. Moreover this algorithm generically computes the minimal automaton: for a large variety of natural distributions the probability that the output is the minimal automaton of A*X tends to one as the size of X tends to infinity.