Cost-Sensitive self-training

Authors:
Yuanyuan Guo;Harry Zhang;Bruce Spencer
Affiliations:
Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada;National Research Council Canada, Fredericton, NB, Canada
Venue:
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Year:
2012

Citing 16
Cited 0

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study of Cost-Sensitive Boosting Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An iterative method for multi-class cost-sensitive learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Proactive learning: cost-sensitive active learning with multiple imperfect oracles

Proceedings of the 17th ACM conference on Information and knowledge management
Thresholding for making classifiers cost-sensitive

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A self-training approach to cost sensitive uncertainty sampling

Machine Learning
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Active cost-sensitive learning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Semi-supervised self-training for sentence subjectivity classification

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Semi-Supervised Learning

Semi-Supervised Learning
Instance selection in semi-supervised learning

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
SETRED: self-training with editing

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In some real-world applications, it is time-consuming or expensive to collect much labeled data, while unlabeled data is easier to obtain. Many semi-supervised learning methods have been proposed to deal with this problem by utilizing the unlabeled data. On the other hand, on some datasets, misclassifying different classes causes different costs, which challenges the common assumption in classification that classes have the same misclassification cost. For example, misclassifying a fraud as a legitimate transaction could be more serious than misclassifying a legitimate transaction as fraudulent. In this paper, we propose a cost-sensitive self-training method (CS-ST) to improve the performance of Naive Bayes when labeled instances are scarce and different misclassification errors are associated with different costs. CS-ST incorporates the misclassification costs into the learning process of self-training, and approximately estimates the misclassification error to help select unlabeled instances. Experiments on 13 UCI datasets and three text datasets show that, in terms of the total misclassification cost and the number of correctly classified instances with higher costs, CS-ST has better performance than the self-training method and the base classifier learned from the original labeled data only.