Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers
Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Using unlabeled data to improve text classification
Using unlabeled data to improve text classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Online EM algorithm for mixture with application to internet traffic modeling
Computational Statistics & Data Analysis
Semi-supervised multiple classifier systems: background and research directions
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
The Impact of Reliability Evaluation on a Semi-supervised Learning Approach
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Building a new classifier in an ensemble using streaming unlabeled data
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Online semi-supervised ensemble updates for fMRI data
PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Pattern Recognition Letters
Semi-supervised ensemble update strategies for on-line classification of fMRI data
Pattern Recognition Letters
Hi-index | 0.01 |
The abundance of unlabelled data alongside limited labelled data has provoked significant interest in semi-supervised learning methods. ''Naive labelling'' refers to the following simple strategy for using unlabelled data in on-line classification. A new data point is first labelled by the current classifier and then added to the training set together with the assigned label. The classifier is updated before seeing the subsequent data point. Although the danger of a run-away classifier is obvious, versions of naive labelling pervade in on-line adaptive learning. We study the asymptotic behaviour of naive labelling in the case of two Gaussian classes and one variable. The analysis shows that if the classifier model assumes correctly the underlying distribution of the problem, naive labelling will drive the parameters of the classifier towards their optimal values. However, if the model is not guessed correctly, the benefits are outweighed by the instability of the labelling strategy (run-away behaviour of the classifier). The results are based on exact calculations of the point of convergence, simulations, and experiments with 25 real data sets. The findings in our study are consistent with concerns about general use of unlabelled data, flagged up in the recent literature.