A classification algorithm based on local cluster centers with a few labeled training examples

  • Authors:
  • Tianqiang Huang;Yangqiang Yu;Gongde Guo;Kai Li

  • Affiliations:
  • Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China and Department of Computer Science and Technology, Tsinghua University, B ...;Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-supervised learning techniques, such as co-training paradigms, are proposed to deal with data sets with only a few labeled examples. However, the family of co-training paradigms, such as Tri-training and Co-Forest, is likely to mislabel an unlabeled example, thus downgrading the final performance. In practical applications, the labeling process is not always free of error due to subjective reasons. Even some mislabeled examples exist in the few labeled examples given. Supervised clustering provides many benefits in data mining research, but it is generally ineffective with only a few labeled examples. In this paper, a Classification algorithm based on Local Cluster Centers (CLCC) for data sets with a few labeled training data, is proposed. This can reduce the interference of mislabeled data, including those provided by both domain experts and co-training paradigm algorithms. The experimental results on UCI data sets show that CLCC achieves competitive classification accuracy as compared to other traditional and state-of-the-art algorithms, such as SMO, AdaBoost, RandomTree, RandomForest, and Co-Forest.