Characteristic Sets for Polynomial Grammatical Inference

  • Authors:
  • Colin De La Higuera

  • Affiliations:
  • Dé/partement d‘/Informatique Fondamentale (DIF) LIRMM, 161 rue Ada, 34 392 Montpellier Cedex 5, France/ E-mail: delahiguera@lirmm.fr http ://www.lirmm.fr/~cdlh

  • Venue:
  • Machine Learning
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

When concerned about efficient grammatical inference two issues arerelevant: the first one is to determine the quality of the result,and the second is to try to use polynomial time and space. A typicalidea to deal with the first point is to say that an algorithmperforms well if it infers {\it in\ the\ limit} the correctlanguage. The second point has led to debate about how to definepolynomial time: the main definitions of polynomial inference havebeen proposed by Pitt and Angluin. We return in this paper to adefinition proposed by Gold that requires a characteristic set ofstrings to exist for each grammar, and this set to be polynomial inthe size of the grammar or automaton that is to be learned, where thesize of the sample is the sum of the lengths of all strings itincludes. The learning algorithm must also infer correctly as soon asthe characteristic set is included in the data. We first show thatthis definition corresponds to a notion of teachability as defined byGoldman and Mathias. By adapting their teacher/learner model togrammatical inference we prove that languages given by context-freegrammars, simple deterministic grammars, linear grammars andnondeterministic finite automata are not identifiable in the limitfrom polynomial time and data.