Semi-supervised Classification Based on Clustering Ensembles

Authors:
Si Chen;Gongde Guo;Lifei Chen
Affiliations:
School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, China;School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, China;School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, China
Venue:
AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
Year:
2009

Citing 14
Cited 0

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Random Forests

Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Finding Consistent Clusters in Data Partitions

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Adaptive Clustering Ensembles

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Semisupervised Regression with Cotraining-Style Algorithms

IEEE Transactions on Knowledge and Data Engineering
Clusterer ensemble

Knowledge-Based Systems
SETRED: self-training with editing

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real-world applications, there only exist very few labeled samples, while a large number of unlabeled samples are available. Therefore, it is difficult for some traditional semi-supervised algorithms to generate the useful classifiers to evaluate the labeling confidence of unlabeled samples. In this paper, a new semi-supervised classification based on clustering ensembles named SSCCE is proposed. It takes advantages of clustering ensembles to generate multiple partitions for a given dataset, and then uses the clustering consistency index to determine the labeling confidence of unlabeled samples. The algorithm can overcome some defects about the traditional semi-supervised classification algorithms, and enhance the performance of the hypothesis trained on very few labeled samples by exploiting a large number of unlabeled samples. Experiments carried out on ten public data sets from UCI machine learning repository show that this method is effective and feasible.