Active learning with extremely sparse labeled examples

Authors:
Shiliang Sun;David R. Hardoon
Affiliations:
Department of Computer Science and Technology, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China;Data Mining Department, Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, #20-10 Connexis, Singapore
Venue:
Neurocomputing
Year:
2010

Citing 20
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to remove Internet advertisements

Proceedings of the third annual conference on Autonomous Agents
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Selective Sampling with Redundant Views

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Active learning: theory and applications

Active learning: theory and applications
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Democratic Co-Learning

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Batch mode active learning and its application to medical image classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
Gaussian processes for canonical correlation analysis

Neurocomputing
High Reliable Multi-View Semi-Supervised Learning with Extremely Sparse Labeled Data

HIS '08 Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems
Semi-supervised learning with very few labeled training examples

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Active learning with statistical models

Journal of Artificial Intelligence Research
Semi-Supervised Learning

Semi-Supervised Learning
The estimation of the gradient of a density function, with applications in pattern recognition

IEEE Transactions on Information Theory

A review of optimization methodologies in support vector machines

Neurocomputing
Semi-supervised multitask learning via self-training and maximum entropy discrimination

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the setting of active learning there exists a general assumption that labeled examples are available for training a classifier, which in turn is used to examine unlabeled data to select the most 'informative' examples for manual labeling. However, in some domain applications there are a limited number of labeled examples available, such as in the most extreme cases of having a single labeled example per category. In these scenarios, the most existing active learning methodologies cannot be directly applied without initially making an assumption on label assignment. In this paper we present a method for finding high-informative examples for manual labeling based on extremely limited labeled data available during training. We propose using canonical correlation analysis to investigate the correlation between different views of the available data and demonstrate that this measure can be used as a selection criterion for the novel application of active learning using only a single labeled example from each class. We demonstrate our method with promising experimental results on text classification, advertisement removal and multi-class image classification tasks.