Inference on the prediction of ensembles of infinite size

Authors:
Daniel Hernández-Lobato;Gonzalo Martínez-Muñoz;Alberto Suárez
Affiliations:
Machine Learning Group, ICTEAM Institute, Université catholique de Louvain, Place Sainte Barbe 2, B-1348 Louvain-la-Neuve, Belgium;Computer Science Department, Escuela Politécnica Superior, Universidad Autónoma de Madrid, C/Francisco Tomás y Valiente, 11, Madrid 28049, Spain;Computer Science Department, Escuela Politécnica Superior, Universidad Autónoma de Madrid, C/Francisco Tomás y Valiente, 11, Madrid 28049, Spain
Venue:
Pattern Recognition
Year:
2011

Citing 15
Cited 1

Bagging predictors

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Random Forests

Machine Learning
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
A Monte Carlo analysis of ensemble classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Experimental comparison between bagging and Monte Carlo ensemble classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
Extremely randomized trees

Machine Learning
Rotation Forest: A New Classifier Ensemble Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Statistical Instance-Based Pruning in Ensembles of Independent Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Analysis of classification margin for classification accuracy with applications

Neurocomputing
Switching class labels to generate classification ensembles

Pattern Recognition

How large should ensembles of classifiers be?

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we introduce a framework for making statistical inference on the asymptotic prediction of parallel classification ensembles. The validity of the analysis is fairly general. It only requires that the individual classifiers are generated in independent executions of some randomized learning algorithm, and that the final ensemble prediction is made via majority voting. Given an unlabeled test instance, the predictions of the classifiers in the ensemble are obtained sequentially. As the individual predictions become known, Bayes' theorem is used to update an estimate of the probability that the class predicted by the current ensemble coincides with the classification of the corresponding ensemble of infinite size. Using this estimate, the voting process can be halted when the confidence on the asymptotic prediction is sufficiently high. An empirical investigation in several benchmark classification problems shows that most of the test instances require querying only a small number of classifiers to converge to the infinite ensemble prediction with a high degree of confidence. For these instances, the difference between the generalization error of the finite ensemble and the infinite ensemble limit is very small, often negligible.