Semi-Supervised learning on a budget: scaling up to large datasets

Authors:
Sandra Ebert;Mario Fritz;Bernt Schiele
Affiliations:
Max Planck Institute for Informatics, Saarbrucken, Germany;Max Planck Institute for Informatics, Saarbrucken, Germany;Max Planck Institute for Informatics, Saarbrucken, Germany
Venue:
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Year:
2012

Citing 10
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Beyond the point cloud: from transductive to semi-supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Label Propagation through Linear Neighborhoods

IEEE Transactions on Knowledge and Data Engineering
Towards Scalable Dataset Construction: An Active Learning Approach

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Prototype vector machine for large scale semi-supervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Extracting structures in image collections for object recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Adaptive Manifold Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating knowledge transfer and zero-shot learning in a large-scale setting

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Visual and semantic similarity in ImageNet

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.