Language learning from stochastic input

Authors:
Syam Kapur;Gianfranco Bilardi
Affiliations:
Institute for Research in Cognitive Science, University of Pennsylvania, 3401 Walnut Street-Rm 412C, Philadelphia PA;Dipartimento di Elettronica ed Informatica, Universita di Padova, Via Gradenigo 6/A, 35131 Padova Italy
Venue:
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Year:
1992

Citing 2
Cited 3

The acquisition of syntactic knowledge

The acquisition of syntactic knowledge
Computational learning of languages

Computational learning of languages

On learning in the limit and non-uniform (&egr;,&dgr;)-learning

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Learning from Random Text

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Probabilistic Finite-State Machines-Part II

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language learning from positive data in the Gold model of inductive inference is investigated in a setting where the data can be modeled as a stochastic process. Specifically, the input strings are assumed to form a sequence of identically distributed, independent random variables, where the distribution depends on the language being presented. A scheme is developed which can be tuned to learn, with probability one, any family of recursive languages, given a recursive enumeration of total indices for the languages in the family and a procedure to compute a lower bound to the probability of occurrence of a given string in a given language. Variations of the scheme work under other assumptions, e.g., if the probabilities of the strings form a monotone sequence with respect to a given enumeration. The learning algorithm is rather simple and appears psychologically plausible. A more sophisticated version of the learner is also developed, based on a probabilistic version of the notion of tell-tale subset. This version yields, as a special case, Angluin's learner for the families of languages that are learnable from all texts (and not just from a set of texts of probability one).