A classification algorithm based on local cluster centers with a few labeled training examples

Authors:
Tianqiang Huang;Yangqiang Yu;Gongde Guo;Kai Li
Affiliations:
Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China and Department of Computer Science and Technology, Tsinghua University, B ...;Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;Department of Computer Science, School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China
Venue:
Knowledge-Based Systems
Year:
2010

Citing 14
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semi-supervised learning using randomized mincuts

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Supervised Clustering " Algorithms and Benefits

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
A New Supervised Clustering Algorithm for Data Set with Mixed Attributes

SNPD '07 Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing - Volume 02
Discovery of interesting regions in spatial data sets using supervised clustering

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A supervised clustering and classification algorithm for mining data with mixed variables

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Semantically-grounded construction of centroids for datasets with textual attributes

Knowledge-Based Systems
A modification of the k-means method for quasi-unsupervised learning

Knowledge-Based Systems
Combining active learning and semi-supervised learning to construct SVM classifier

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised learning techniques, such as co-training paradigms, are proposed to deal with data sets with only a few labeled examples. However, the family of co-training paradigms, such as Tri-training and Co-Forest, is likely to mislabel an unlabeled example, thus downgrading the final performance. In practical applications, the labeling process is not always free of error due to subjective reasons. Even some mislabeled examples exist in the few labeled examples given. Supervised clustering provides many benefits in data mining research, but it is generally ineffective with only a few labeled examples. In this paper, a Classification algorithm based on Local Cluster Centers (CLCC) for data sets with a few labeled training data, is proposed. This can reduce the interference of mislabeled data, including those provided by both domain experts and co-training paradigm algorithms. The experimental results on UCI data sets show that CLCC achieves competitive classification accuracy as compared to other traditional and state-of-the-art algorithms, such as SMO, AdaBoost, RandomTree, RandomForest, and Co-Forest.