The existence of a priori distinctions between learning algorithms

Authors:
David H. Wolpert
Affiliations:
The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
Venue:
Neural Computation
Year:
1996

Citing 1
Cited 12

Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization

Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization

On bias plus variance

Neural Computation
No free lunch for early stopping

Neural Computation
Linearly Combining Density Estimators via Stacking

Machine Learning
Bayesian model assessment and comparison using cross-validation predictive densities

Neural Computation
Estimating the Predictive Accuracy of a Classifier

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Improved Dataset Characterisation for Meta-learning

DS '02 Proceedings of the 5th International Conference on Discovery Science
Architectures and Idioms: Making Progress in Agent Design

ATAL '00 Proceedings of the 7th International Workshop on Intelligent Agents VII. Agent Theories Architectures and Languages
Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning

Neural Computation
No Free Lunch for Noise Prediction

Neural Computation
Enhancing the generalization ability of neural networks through controlling the hidden layers

Applied Soft Computing
Using domain-specific knowledge in generalization error bounds for support vector machine learning

Decision Support Systems
SRF: a framework for the study of classifier behavior under training set mislabeling noise

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).