An ensemble dependence measure

  • Authors:
  • Matthew Prior;Terry Windeatt

  • Affiliations:
  • Centre for Vision Speech and Signal Processing, University of Surrey, Surrey, UK;Centre for Vision Speech and Signal Processing, University of Surrey, Surrey, UK

  • Venue:
  • ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble methods in supervised classification problems have been shown to be superior to single base classifiers of comparable performance, particularly when used in conjunction with multi-layer perceptron base classifiers. An ensemble's performance is related to the accuracy and diversity of its component classifiers. Intuitively, diversity seems to be a desirable quality for a collection of non-optimal classifiers. Despite much interest being shown in diversity, little progress has been made in linking generalisation performance to any specific diversity metric. With the agglomeration of even modestly accurate statistically independent classifiers it can be shown theoretically that ensemble accuracy can be forced close to optimality. Despite this theoretical benchmark, real world ensembles fall far short of this performance. The root of this problem is the lack of statistical independence amongst the base classifiers. We investigate a measure of statistical dependence in ensembles, D, and its relationship to the Q diversity metric and pairwise correlation and also examine voting patterns in real world ensembles. We show that, whilst Q is relatively insensitive to changes in the ensemble configuration D measures correlations between the base classifiers effectively. The experiments are based on several two class problems from the UCI data sets and use bootstrapped ensembles of relatively weak, multi-layer perceptron, base classifiers.