The nature of statistical learning theory
The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Pairwise classification and support vector machines
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
On Issues of Instance Selection
Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Integrating feature and instance selection for text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting support vector machines for text classification through parameter-free threshold relaxation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Blocking Reduction Strategies in Hierarchical Text Classification
IEEE Transactions on Knowledge and Data Engineering
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Introducing a Family of Linear Measures for Feature Selection in Text Categorization
IEEE Transactions on Knowledge and Data Engineering
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
Exploratory Under-Sampling for Class-Imbalance Learning
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
An integrated two-stage model for intelligent information routing
Decision Support Systems
The class imbalance problem: A systematic study
Intelligent Data Analysis
A machine learning approach to web page filtering using content and structure analysis
Decision Support Systems
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Automatic online news monitoring and classification for syndromic surveillance
Decision Support Systems
FISA: feature-based instance selection for imbalanced text classification
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
ROLEX-SP: Rules of lexical syntactic patterns for free text categorization
Knowledge-Based Systems
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
Information Processing and Management: an International Journal
Re-mining item associations: Methodology and a case study in apparel retailing
Decision Support Systems
Towards the taxonomy-oriented categorization of yellow pages queries
ACM Transactions on Internet Technology (TOIT)
Preprocessing unbalanced data using support vector machine
Decision Support Systems
A normal distribution-based over-sampling approach to imbalanced data classification
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Sample cutting method for imbalanced text sentiment classification based on BRC
Knowledge-Based Systems
Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction
Knowledge-Based Systems
Going-concern prediction using hybrid random forests and rough set approach
Information Sciences: an International Journal
Exploiting poly-lingual documents for improving text categorization effectiveness
Decision Support Systems
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems
Applied Soft Computing
Hi-index | 0.00 |
Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision-Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies.