A refinement approach to handling model misfit in semi-supervised learning

Authors:
Hanjing Su;Ling Chen;Yunming Ye;Zhaocai Sun;Qingyao Wu
Affiliations:
Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;QCIS, Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia;Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
Venue:
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Year:
2010

Citing 8
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
A hybrid generative/discriminative approach to semi-supervised classifier design

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Batch-Mode Active Learning with Semi-supervised Cluster Tree for Text Classification

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graph-based algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Self-training and Tri-training.