Language learning from stochastic input

  • Authors:
  • Syam Kapur;Gianfranco Bilardi

  • Affiliations:
  • Institute for Research in Cognitive Science, University of Pennsylvania, 3401 Walnut Street-Rm 412C, Philadelphia PA;Dipartimento di Elettronica ed Informatica, Universita di Padova, Via Gradenigo 6/A, 35131 Padova Italy

  • Venue:
  • COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Language learning from positive data in the Gold model of inductive inference is investigated in a setting where the data can be modeled as a stochastic process. Specifically, the input strings are assumed to form a sequence of identically distributed, independent random variables, where the distribution depends on the language being presented. A scheme is developed which can be tuned to learn, with probability one, any family of recursive languages, given a recursive enumeration of total indices for the languages in the family and a procedure to compute a lower bound to the probability of occurrence of a given string in a given language. Variations of the scheme work under other assumptions, e.g., if the probabilities of the strings form a monotone sequence with respect to a given enumeration. The learning algorithm is rather simple and appears psychologically plausible. A more sophisticated version of the learner is also developed, based on a probabilistic version of the notion of tell-tale subset. This version yields, as a special case, Angluin's learner for the families of languages that are learnable from all texts (and not just from a set of texts of probability one).