C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning - Special issue on learning with probabilistic representations
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Knowledge Discovery in Multi-label Phenotype Data
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
An introduction to variable and feature selection
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An adaptive k-nearest neighbor text categorization strategy
ACM Transactions on Asian Language Information Processing (TALIP)
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Exploratory Under-Sampling for Class-Imbalance Learning
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The class imbalance problem: A systematic study
Intelligent Data Analysis
Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study
IEEE Transactions on Knowledge and Data Engineering
A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling
MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Hi-index | 0.00 |
An interesting issue in machine learning is induction in multi-label domains where each example can be labeled with two or more classes at the same time. In a work focusing on text categorization, we followed the most commonly used approach and induced a binary classifier for each class. Analyzing the results, we noticed that performance had been impaired by two factors. First, in text domains, each class is characterized by a different set of attributes; an appropriate attribute-selection technique thus has to be applied separately to each of them. Second, the individual classes often have to be induced from imbalanced training sets, a circumstance we addressed here by majority-class undersampling. The paper provides details of the induction system and reports the results of systematic experimentation.