A refinement approach to handling model misfit in semi-supervised learning

  • Authors:
  • Hanjing Su;Ling Chen;Yunming Ye;Zhaocai Sun;Qingyao Wu

  • Affiliations:
  • Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;QCIS, Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia;Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China;Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China

  • Venue:
  • ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graph-based algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Self-training and Tri-training.