SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Finding Consistent Clusters in Data Partitions
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Semisupervised Regression with Cotraining-Style Algorithms
IEEE Transactions on Knowledge and Data Engineering
Knowledge-Based Systems
SETRED: self-training with editing
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.00 |
In many real-world applications, there only exist very few labeled samples, while a large number of unlabeled samples are available. Therefore, it is difficult for some traditional semi-supervised algorithms to generate the useful classifiers to evaluate the labeling confidence of unlabeled samples. In this paper, a new semi-supervised classification based on clustering ensembles named SSCCE is proposed. It takes advantages of clustering ensembles to generate multiple partitions for a given dataset, and then uses the clustering consistency index to determine the labeling confidence of unlabeled samples. The algorithm can overcome some defects about the traditional semi-supervised classification algorithms, and enhance the performance of the hypothesis trained on very few labeled samples by exploiting a large number of unlabeled samples. Experiments carried out on ten public data sets from UCI machine learning repository show that this method is effective and feasible.