The Strength of Weak Learnability
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning - Special issue on learning with probabilistic representations
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Knowledge Discovery in Multi-label Phenotype Data
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
An adaptive k-nearest neighbor text categorization strategy
ACM Transactions on Asian Language Information Processing (TALIP)
Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Text categorization is an important application domain of multilabel classification where each document can simultaneously belong to more than one class. The most common approach is to address the problem of multi-label examples by inducing a separate binary classifier for each class, and then use these classifiers in parallel. What the information-retrieval community has all but ignored, however, is that such classifiers are almost always induced from highly imbalanced training sets. The study reported in this paper shows how taking this aspect into consideration with a majority-class undersampling we used here can indeed improve classification performance as measured by criteria common in text categorization: macro/micro precision, recall, and F1. We also show how a slight modification of an older undersampling technique helps further improve the results.