Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Diffusion Kernels on Graphs and Other Discrete Input Spaces
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Semi-Supervised Self-Training of Object Detection Models
WACV-MOTION '05 Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01
Semi-supervised protein classification using cluster kernels
Bioinformatics
Learning from labeled and unlabeled data on a directed graph
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Semi-supervised learning with graphs
Semi-supervised learning with graphs
On a theory of learning with similarity functions
ICML '06 Proceedings of the 23rd international conference on Machine learning
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Using weighted nearest neighbor to benefit from unlabeled data
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Budget Semi-supervised Learning
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting propositionalization based on random relational rules for semi-supervised learning
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Employing document dependency in blog search
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semisupervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems.