Scaling up semi-supervised learning: an efficient and effective LLGC variant

Authors:
Bernhard Pfahringer;Claire Leschi;Peter Reutemann
Affiliations:
Department of Computer Science, University of Waikato, Hamilton, New Zealand;INSA Lyon, France;Department of Computer Science, University of Waikato, Hamilton, New Zealand
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 15
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Semi-Supervised Self-Training of Object Detection Models

WACV-MOTION '05 Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01
Semi-supervised protein classification using cluster kernels

Bioinformatics
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Semi-supervised learning with graphs

Semi-supervised learning with graphs
On a theory of learning with similarity functions

ICML '06 Proceedings of the 23rd international conference on Machine learning
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
Using weighted nearest neighbor to benefit from unlabeled data

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Budget Semi-supervised Learning

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting propositionalization based on random relational rules for semi-supervised learning

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Employing document dependency in blog search

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semisupervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems.