Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization
Neural Computation
No free lunch for early stopping
Neural Computation
Linearly Combining Density Estimators via Stacking
Machine Learning
Estimating the Predictive Accuracy of a Classifier
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Improved Dataset Characterisation for Meta-learning
DS '02 Proceedings of the 5th International Conference on Discovery Science
Architectures and Idioms: Making Progress in Agent Design
ATAL '00 Proceedings of the 7th International Workshop on Intelligent Agents VII. Agent Theories Architectures and Languages
No Free Lunch for Noise Prediction
Neural Computation
Using domain-specific knowledge in generalization error bounds for support vector machine learning
Decision Support Systems
SRF: a framework for the study of classifier behavior under training set mislabeling noise
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Hi-index | 0.00 |
This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).