Communications of the ACM
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow
COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
Robust trainability of single neurons
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Efficient distribution-free learning of probabilistic concepts
Journal of Computer and System Sciences - Special issue: 31st IEEE conference on foundations of computer science, Oct. 22–24, 1990
The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Machine Learning
Estimating a mixture of two product distributions
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Text chunking based on a generalization of winnow
The Journal of Machine Learning Research
Hi-index | 0.00 |
We study a learning problem which allows for a “fair” comparison between unsupervised learning methods—probabilistic model construction, and more traditional algorithms that directly learn a classification. The merits of each approach are intuitively clear: inducing a model is more expensive computationally, but may support a wider range of predictions. Its performance, however, will depend on how well the postulated probabilistic model fits that data. To compare the paradigms we consider a model which postulates a single binary-valued hidden variable on which all other attributes depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn the model with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using a model that is even “slightly” simpler than the distribution actually generating the data, vs. the relative robustness of directly searching for a good predictor.