Learning theory and language modeling

Authors:
David McAllester;Robert E. Schapire
Affiliations:
AT&T Labs--Research, Shannon Laboratory, 180 Park Avenue, Florham Park, NJ;AT&T Labs--Research, Shannon Laboratory, 180 Park Avenue, Florham Park, NJ
Venue:
Exploring artificial intelligence in the new millennium
Year:
2003

Citing 9
Cited 1

Occam's razor

Information Processing Letters
Elements of information theory

Elements of information theory
Randomized Distributed Edge Coloring via an Extension of the Chernoff--Hoeffding Bounds

SIAM Journal on Computing
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
An Efficient Extension to Mixture Techniques for Prediction and Decision Trees

Machine Learning
On the Convergence Rate of Good-Turing Estimators

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Exploiting syntactic structure for language modeling

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1

A bibliographical study of grammatical inference

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider some of our recent work on Good-Turing estimators in the larger context of learning theory and language modeling. The Good-Turing estimators have played a significant role in natural language modeling for the past 20 years. We have recently shown that these particular leave-one-out estimators converge rapidly. We present these results and consider possible consequences for language modeling in general. In particular, other leave-one-out estimators, such as for the cross-entropy of various forms of language models, might also be shown to be rapidly converging using proof methods similar to those used for the Good-Turing estimators. This could have broad ramifications in the analysis and development of language modeling methods. We suggest that, in language modeling at least, leave-one-out estimation may be more significant than Occam's razor.