On Issues of Instance Selection
Data Mining and Knowledge Discovery
Integrating feature and instance selection for text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
One-class svms for document classification
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Floatcascade learning for fast imbalanced web mining
Proceedings of the 17th international conference on World Wide Web
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
Sample cutting method for imbalanced text sentiment classification based on BRC
Knowledge-Based Systems
Hi-index | 0.00 |
Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning time, methods based on FISA delivered much better classification accuracy than those methods using all negative training documents.