Boosting to correct inductive bias in text classification

Authors:
Yan Liu;Yiming Yang;Jaime Carbonell
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 20
Cited 13

C4.5: programs for machine learning

C4.5: programs for machine learning
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Bagging predictors

Machine Learning
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Information Retrieval

Information Retrieval
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
A theory of the learnable

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Classification of acoustic events using SVM-based clustering schemes

Pattern Recognition
Large margin DragPushing strategy for centroid text categorization

Expert Systems with Applications: An International Journal
Using hypothesis margin to boost centroid text classifier

Proceedings of the 2007 ACM symposium on Applied computing
Boosted Classification Trees and Class Probability/Quantile Estimation

The Journal of Machine Learning Research
Combining error-correcting output codes and model-refinement for text categorization

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An improved centroid classifier for text categorization

Expert Systems with Applications: An International Journal
An Effective Approach to Enhance Centroid Classifier for Text Categorization

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
Using error-correcting output codes with model-refinement to boost centroid text classifier

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Adapting centroid classifier for document categorization

Expert Systems with Applications: An International Journal
Toward a semantic granularity model for domain-specific information retrieval

ACM Transactions on Information Systems (TOIS)
Modeling personalized email prioritization: classification-based and regression-based approaches

Proceedings of the 20th ACM international conference on Information and knowledge management
Towards enhancing centroid classifier for text classification-A border-instance approach

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of each classifier and explore how boosting, as an error-driven resampling mechanism, reacts to those biases. Our experiments on the Reuters-21578 benchmark show that boosting is not effective in improving the performance of the base classifiers on common categories. However, the effect of boosting for rare categories varies across classifiers: for SVMs and Decision Trees, we achieved a 13-17% performance improvement in macro-averaged F1 measure, but did not obtain substantial improvement for the other two classifiers. This interesting finding of boosting on rare categories has not been reported before.