Learning in natural language

Authors:
Dan Roth
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 16
Cited 6

A theory of the learnable

Communications of the ACM
Toward efficient agnostic learning

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Robust trainability of single neurons

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A Learning Criterion for Stochastic Rules

Machine Learning - Computational learning theory
Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
Efficient noise-tolerant learning from statistical queries

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
On the power of polynomial discriminators and radial basis function networks

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Statistical queries and faulty PAC oracles

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Specification and simulation of statistical query algorithms for efficiency and noise tolerance

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Linear concepts and hidden variables: an empirical study

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Part of speech tagging using a network of linear separators

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Understanding Probabilistic Classifiers

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Coherent Concepts, Robust Learning

SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Classification Approach to Word Selection in Machine Translation

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Clustering documents into a web directory for bootstrapping a supervised classification

Data & Knowledge Engineering - Special issue: WIDM 2003
Reward-modulated hebbian learning of decision making

Neural Computation
Algorithm selection and model adaptation for ESL correction tasks

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistics-based classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, leaving open the question of why these approaches work. This paper presents a learning theory account of the major statistical approaches to learning in natural language. A class of Linear Statistical Queries (LSQ) hypotheses is defined and learning with it is shown to exhibit some robustness properties. Many statistical learners used in natural language, including naive Bayes, Markov Models and Maximum Entropy models are shown to be LSQ hypotheses, explaining the robustness of these predictors even when the underlying probabilistic assumptions do not hold. This coherent view of when and why learning approaches work in this context may help to develop better learning methods and an understanding of the role of learning in natural language inferences.