Constrained parameter estimation for semi-supervised learning: the case of the nearest mean classifier

Authors:
Marco Loog
Affiliations:
Pattern Recognition Laboratory, Delft University of Technology, Delft, The Netherlands
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Year:
2010

Citing 11
Cited 4

On the exponential value of labeled samples

Pattern Recognition Letters
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning Classification with Both Labeled and Unlabeled Data

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Understanding the Yarowsky Algorithm

Computational Linguistics
A protocol for building and evaluating predictors of disease state based on microarray data

Bioinformatics
The asymptotics of semi-supervised learning in discriminative probabilistic models

Proceedings of the 25th international conference on Machine learning
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Semi-Supervised Learning

Semi-Supervised Learning

Semi-supervised linear discriminant analysis using moment constraints

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Constrained log-likelihood-based semi-supervised linear discriminant analysis

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Unlabeling data can improve classification accuracy

Pattern Recognition Letters
Semi-supervised linear discriminant analysis through moment-constraint parameter estimation

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

A rather simple semi-supervised version of the equally simple nearest mean classifier is presented. However simple, the proposed approach is of practical interest as the nearest mean classifier remains a relevant tool in biomedical applications or other areas dealing with relatively high-dimensional feature spaces or small sample sizes. More importantly, the performance of our semi-supervised nearest mean classifier is typically expected to improve over that of its standard supervised counterpart and typically does not deteriorate with increasing numbers of unlabeled data. This behavior is achieved by constraining the parameters that are estimated to comply with relevant information in the unlabeled data, which leads, in expectation, to a more rapid convergence to the large-sample solution because the variance of the estimate is reduced. In a sense, our proposal demonstrates that it may be possible to properly train a known classification scheme such that it can benefit from unlabeled data, while avoiding the additional assumptions typically made in semi-supervised learning.