Semi-supervised learning applied to large data sets with very few labeled examples

Authors:
Hong Chen;Gongde Guo
Affiliations:
School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, Fujian, China;School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, Fujian, China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 7
Cited 0

Learning from a mixture of labeled and unlabeled examples with parametric side information

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Principles of data mining

Principles of data mining
Semi-supervised learning using randomized mincuts

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Semi-supervised approach to rapid and reliable labeling of large data sets

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Semi-Supervised classification approach, SS-LFL, is proposed. In SS-LFL, some weak binary classifiers, each of which can identify instances of one particular class, are firstly trained on the labeled data, and the whole data set is then clustered into partitions until they are tight and pure enough. SS-LFL alternates between assigning "imperfect-classes" to the unlabeled data in these partitions and constructing the next weak binary classifiers using both the labeled and "imperfect" data. It works well in large data sets with very few labeled examples, moreover, it neither requires known parametric distributions of data nor participation of an expert. Experimental results carried out on some public datasets collected from the UCI machine learning repository show that SS-LFL is a promising method.