Keepin' it real: semi-supervised learning with realistic tuning

Authors:
Andrew B. Goldberg;Xiaojin Zhu
Affiliations:
University of Wisconsin-Madison, Madison, WI;University of Wisconsin-Madison, Madison, WI
Venue:
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Year:
2009

Citing 12
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Text categorization by boosting automatically extracted concepts

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Word-sense disambiguation using decomposable models

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Beyond the point cloud: from transductive to semi-supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Semi-supervised learning for structured output variables

ICML '06 Proceedings of the 23rd international conference on Machine learning
Deterministic annealing for semi-supervised kernel machines

ICML '06 Proceedings of the 23rd international conference on Machine learning
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Large scale manifold transduction

Proceedings of the 25th international conference on Machine learning
An RKHS for multi-view learning and manifold co-regularization

Proceedings of the 25th international conference on Machine learning
May all your wishes come true: a study of wishes and how to recognize them

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

A graph-based semi-supervised learning for question semantic labeling

SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Self-adjusting bootstrapping

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Semi-Supervised Learning with Measure Propagation

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address two critical issues involved in applying semi-supervised learning (SSL) to a real-world task: parameter tuning and choosing which (if any) SSL algorithm is best suited for the task at hand. To gain a better understanding of these issues, we carry out a medium-scale empirical study comparing supervised learning (SL) to two popular SSL algorithms on eight natural language processing tasks under three performance metrics. We simulate how a practitioner would go about tackling a new problem, including parameter tuning using cross validation (CV). We show that, under such realistic conditions, each of the SSL algorithms can be worse than SL on some datasets. However, we also show that CV can select SL/SSL to achieve "agnostic SSL," whose performance is almost always no worse than SL. While CV is often dismissed as unreliable for SSL due to the small amount of labeled data, we show that it is in fact effective for accuracy even when the labeled dataset size is as small as 10.