Machine Learning
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Over-Fitting in ensembles of neural network classifiers within ECOC frameworks
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Dynamics of variance reduction in bagging and other techniques based on randomisation
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Hi-index | 0.00 |
Ensemble methods in supervised classification problems have been shown to be superior to single base classifiers of comparable performance, particularly when used in conjunction with multi-layer perceptron base classifiers. An ensemble's performance is related to the accuracy and diversity of its component classifiers. Intuitively, diversity seems to be a desirable quality for a collection of non-optimal classifiers. Despite much interest being shown in diversity, little progress has been made in linking generalisation performance to any specific diversity metric. With the agglomeration of even modestly accurate statistically independent classifiers it can be shown theoretically that ensemble accuracy can be forced close to optimality. Despite this theoretical benchmark, real world ensembles fall far short of this performance. The root of this problem is the lack of statistical independence amongst the base classifiers. We investigate a measure of statistical dependence in ensembles, D, and its relationship to the Q diversity metric and pairwise correlation and also examine voting patterns in real world ensembles. We show that, whilst Q is relatively insensitive to changes in the ensemble configuration D measures correlations between the base classifiers effectively. The experiments are based on several two class problems from the UCI data sets and use bootstrapped ensembles of relatively weak, multi-layer perceptron, base classifiers.