Ensemble pruning for text categorization based on data partitioning

Authors:
Cagri Toraman;Fazli Can
Affiliations:
Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Ankara, Turkey;Bilkent Information Retrieval Group, Computer Engineering Department, Bilkent University, Ankara, Turkey
Venue:
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Year:
2011

Citing 12
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Pruning Adaptive Boosting

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Ensemble selection from libraries of models

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Text classification based on data partitioning and parameter varying ensembles

Proceedings of the 2005 ACM symposium on Applied computing
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Getting the Most Out of Ensemble Selection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Using boosting to prune bagging ensembles

Pattern Recognition Letters
Ensemble pruning via individual contribution ordering

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Squeezing the ensemble pruning: faster and more accurate categorization for news portals

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning.