A case-study on naïve labelling for the nearest mean and the linear discriminant classifiers

Authors:
L. I. Kuncheva;C. J. Whitaker;A. Narasimhamurthy
Affiliations:
School of Computer Science, Bangor University, Bangor LL57 1UT, UK;School of Psychology, Bangor University 1UT, UK;School of Computer Science and Informatics, University College Dublin (UCD), Dublin, Ireland
Venue:
Pattern Recognition
Year:
2008

Citing 7
Cited 6

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers

Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Using unlabeled data to improve text classification

Using unlabeled data to improve text classification
Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Online EM algorithm for mixture with application to internet traffic modeling

Computational Statistics & Data Analysis
Semi-supervised multiple classifier systems: background and research directions

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems

The Impact of Reliability Evaluation on a Semi-supervised Learning Approach

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Building a new classifier in an ensemble using streaming unlabeled data

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Naive random subspace ensemble with linear classifiers for real-time classification of fMRI data

Pattern Recognition
Online semi-supervised ensemble updates for fMRI data

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
A case study of linear classifiers adapted using imperfect labels derived from human event-related potentials

Pattern Recognition Letters
Semi-supervised ensemble update strategies for on-line classification of fMRI data

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

The abundance of unlabelled data alongside limited labelled data has provoked significant interest in semi-supervised learning methods. ''Naive labelling'' refers to the following simple strategy for using unlabelled data in on-line classification. A new data point is first labelled by the current classifier and then added to the training set together with the assigned label. The classifier is updated before seeing the subsequent data point. Although the danger of a run-away classifier is obvious, versions of naive labelling pervade in on-line adaptive learning. We study the asymptotic behaviour of naive labelling in the case of two Gaussian classes and one variable. The analysis shows that if the classifier model assumes correctly the underlying distribution of the problem, naive labelling will drive the parameters of the classifier towards their optimal values. However, if the model is not guessed correctly, the benefits are outweighed by the instability of the labelling strategy (run-away behaviour of the classifier). The results are based on exact calculations of the point of convergence, simulations, and experiments with 25 real data sets. The findings in our study are consistent with concerns about general use of unlabelled data, flagged up in the recent literature.