Sharing Classifiers among Ensembles from Related Problem Domains

Authors:
Yi Zhang;W. Nick Street;Samuel Burer
Affiliations:
University of Iowa;University of Iowa;University of Iowa
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 15
Cited 1

Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

Journal of the ACM (JACM)
Bagging predictors

Machine Learning
Multitask Learning

Machine Learning - Special issue on inductive transfer
The application of AdaBoost for distributed, scalable and on-line learning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The distributed boosting algorithm

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributed Data Mining in Credit Card Fraud Detection

IEEE Intelligent Systems
Pruning Adaptive Boosting

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Ensemble selection from libraries of models

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research

Application of tree mining to matching of knowledge structures of decision tree type

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A classification ensemble is a group of classifiers that all solve the same prediction problem in different ways. It is well-known that combining the predictions of classifiers within the same problem domain using techniques like bagging or boosting often improves the performance. This research shows that sharing classifiers among different but closely related problem domains can also be helpful. In addition, a semi-definite programming based ensemble pruning method is implemented in order to optimize the selection of a subset of classifiers for each problem domain. Computational results on a catalog dataset indicate that the ensembles resulting from sharing classifiers among different product categories generally have larger AUCs than those ensembles trained only on their own categories. The pruning algorithm not only prevents the occasional decrease of effectiveness caused by conflicting concepts among the problem domains, but also provides a better understanding of the problem domains and their relationships.