C4.5: programs for machine learning
C4.5: programs for machine learning
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Information Retrieval
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
Maximizing Text-Mining Performance
IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Classification of acoustic events using SVM-based clustering schemes
Pattern Recognition
Large margin DragPushing strategy for centroid text categorization
Expert Systems with Applications: An International Journal
Using hypothesis margin to boost centroid text classifier
Proceedings of the 2007 ACM symposium on Applied computing
Boosted Classification Trees and Class Probability/Quantile Estimation
The Journal of Machine Learning Research
Combining error-correcting output codes and model-refinement for text categorization
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An improved centroid classifier for text categorization
Expert Systems with Applications: An International Journal
An Effective Approach to Enhance Centroid Classifier for Text Categorization
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Using error-correcting output codes with model-refinement to boost centroid text classifier
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Adapting centroid classifier for document categorization
Expert Systems with Applications: An International Journal
Toward a semantic granularity model for domain-specific information retrieval
ACM Transactions on Information Systems (TOIS)
Modeling personalized email prioritization: classification-based and regression-based approaches
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of each classifier and explore how boosting, as an error-driven resampling mechanism, reacts to those biases. Our experiments on the Reuters-21578 benchmark show that boosting is not effective in improving the performance of the base classifiers on common categories. However, the effect of boosting for rare categories varies across classifiers: for SVMs and Decision Trees, we achieved a 13-17% performance improvement in macro-averaged F1 measure, but did not obtain substantial improvement for the other two classifiers. This interesting finding of boosting on rare categories has not been reported before.