On diversity and accuracy of homogeneous and heterogeneous ensembles

Authors:
Shun Bian;Wenjia Wang
Affiliations:
School of Computing Sciences, University of East Anglia, Norwich, UK;(Correspd. wjw@cmp.uea.ac.uk) School of Computing Sciences, University of East Anglia, Norwich, UK
Venue:
International Journal of Hybrid Intelligent Systems
Year:
2007

Citing 18
Cited 11

Generalized best-first search strategies and the optimality of A*

Journal of the ACM (JACM)
Instance-Based Learning Algorithms

Machine Learning
Original Contribution: Stacked generalization

Neural Networks
Bagging predictors

Machine Learning
Using analytic QP and sparseness to speed training of support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Multistrategy Ensemble Learning: Reducing Error by Combining Ensemble Learning Techniques

IEEE Transactions on Knowledge and Data Engineering
A Comparative Study of Feature-Salience Ranking Techniques

Neural Computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Multi-Classifier Systems: Review and a roadmap for developers

International Journal of Hybrid Intelligent Systems
Selective fusion of heterogeneous classifiers

Intelligent Data Analysis
Protein classification with multiple algorithms

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Evaluation of diversity measures for binary classifier ensembles

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems

Committee machines for facial-gender recognition

International Journal of Hybrid Intelligent Systems
True Path Rule Hierarchical Ensembles

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
An efficient hybrid classification algorithm: an example from palliative care

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
Evolutionary optimization of regression model ensembles in steel-making process

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Exploring the behaviour of base classifiers in credit scoring ensembles

Expert Systems with Applications: An International Journal
Two-level classifier ensembles for credit risk assessment

Expert Systems with Applications: An International Journal
Acute leukemia classification by ensemble particle swarm model selection

Artificial Intelligence in Medicine
Machine learning-based classifiers ensemble for credit risk assessment

International Journal of Electronic Finance
Classification of major construction materials in construction environments using ensemble classifiers

Advanced Engineering Informatics
Combining multiple predictive models using genetic algorithms

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Hybrid random subsample classifier ensemble for high dimensional data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ensemble learning approach has been increasingly used in data mining for improving performance. However, the gain on the learning performance appears varying considerably from application to application. In some cases there were little or no gains achieved even when the same ensemble paradigms were used. This means that there are still some problems in understanding some basic and fundamental issues in ensemble methodology, especially on the factors that can affect the performance of an ensemble and the strategies for constructing effective ensembles. This paper attempts to address these issues. It first describes the possible influencing factors and then focuses on investigating the most important factor - diversity and its relationships with the accuracy of ensemble. In this study, two types of ensembles - homogeneous and heterogeneous ensembles are defined and constructed by using ten different learning algorithms and their diversity and accuracy are evaluated in order to find out which types of ensemble possess high diversity and are thus more accurate. For each of the ten learning algorithms, its ability for generating different types of diversity is estimated quantitatively by using ten common diversity measures and their characteristics are then analyzed to establish their correlation with ensemble performance. The study used fifteen popular data sets to verify the consistence and reliability of our experimental findings.