Linear Concepts and Hidden Variables

Authors:
Adam J. Grove;Dan Roth
Affiliations:
NECI, Princeton, NJ. grove@pobox.com;Department of Computer Science, University of Illinois at Urbana-Champaign. danr@cs.uiuc.edu
Venue:
Machine Learning
Year:
2001

Citing 12
Cited 1

A theory of the learnable

Communications of the ACM
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow

COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
Robust trainability of single neurons

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Efficient distribution-free learning of probabilistic concepts

Journal of Computer and System Sciences - Special issue: 31st IEEE conference on foundations of computer science, Oct. 22–24, 1990
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Bagging predictors

Machine Learning
Empirical Support for Winnow and Weighted-MajorityAlgorithms: Results on a Calendar Scheduling Domain

Machine Learning
Estimating a mixture of two product distributions

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science

Text chunking based on a generalization of winnow

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a learning problem which allows for a “fair” comparison between unsupervised learning methods—probabilistic model construction, and more traditional algorithms that directly learn a classification. The merits of each approach are intuitively clear: inducing a model is more expensive computationally, but may support a wider range of predictions. Its performance, however, will depend on how well the postulated probabilistic model fits that data. To compare the paradigms we consider a model which postulates a single binary-valued hidden variable on which all other attributes depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn the model with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using a model that is even “slightly” simpler than the distribution actually generating the data, vs. the relative robustness of directly searching for a good predictor.