MP-Boost: a multiple-pivot boosting algorithm and its application to text categorization

Authors:
Andrea Esuli;Tiziano Fagni;Fabrizio Sebastiani
Affiliations:
Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Italy;Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Italy;Istituto di Scienza e Tecnologia dell’Informazione, Consiglio Nazionale delle Ricerche, Italy
Venue:
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Year:
2006

Citing 7
Cited 9

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An improved boosting algorithm and its application to text categorization

Proceedings of the ninth international conference on Information and knowledge management
An introduction to boosting and leveraging

Advanced lectures on machine learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A pitfall and solution in multi-class feature selection for text classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Supervised Textual Document Classification Using Neuronal Group Learning

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Active Learning Strategies for Multi-Label Text Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Encoding Ordinal Features into Binary Features for Text Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Empirically building and evaluating a probabilistic model of user affect

User Modeling and User-Adapted Interaction
Training Data Cleaning for Text Classification

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
ISTI@SemEval-2 task #8: Boosting-based multiway relation classification

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
A utility-theoretic ranking method for semi-automated text classification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Variable-constraint classification and quantification of radiology reports under the ACR Index

Expert Systems with Applications: An International Journal
Improving Text Classification Accuracy by Training Label Cleaning

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

AdaBoost.MH is a popular supervised learning algorithm for building multi-label (aka n-of-m) text classifiers. AdaBoost.MH belongs to the family of “boosting” algorithms, and works by iteratively building a committee of “decision stump” classifiers, where each such classifier is trained to especially concentrate on the document-class pairs that previously generated classifiers have found harder to correctly classify. Each decision stump hinges on a specific “pivot term”, checking its presence or absence in the test document in order to take its classification decision. In this paper we propose an improved version of AdaBoost.MH, called MP-Boost, obtained by selecting, at each iteration of the boosting process, not one but several pivot terms, one for each category. The rationale behind this choice is that this provides highly individualized treatment for each category, since each iteration thus generates, for each category, the best possible decision stump. We present the results of experiments showing that MP-Boost is much more effective than AdaBoost.MH. In particular, the improvement in effectiveness is spectacular when few boosting iterations are performed, and (only) high for many such iterations. The improvement is especially significant in the case of macroaveraged effectiveness, which shows that MP-Boost is especially good at working with hard, infrequent categories.