Complexity bounds for batch active learning in classification

Authors:
Philippe Rolet;Olivier Teytaud
Affiliations:
Inria, LRI, UMR, CNRS, Université Paris-Sud, Orsay Cedex, France;Inria, LRI, UMR, CNRS, Université Paris-Sud, Orsay Cedex, France
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Year:
2010

Citing 18
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Active Learning Using Arbitrary Binary Valued Queries

Machine Learning
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
The nature of statistical learning theory

The nature of statistical learning theory
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Active learning using adaptive resampling

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems

A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Selective Sampling for Nearest Neighbor Classifiers

Machine Learning
Batch mode active learning and its application to medical image classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
A batch ensemble approach to active learning with model selection

Neural Networks
Semisupervised SVM batch mode active learning with applications to image retrieval

ACM Transactions on Information Systems (TOIS)
Active learning with statistical models

Journal of Artificial Intelligence Research
Margin based active learning

COLT'07 Proceedings of the 20th annual conference on Learning theory
Analysis of perceptron-based active learning

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active learning [1] is a branch of Machine Learning in which the learning algorithm, instead of being directly provided with pairs of problem instances and their solutions (their labels), is allowed to choose, from a set of unlabeled data, which instances to query. It is suited to settings where labeling instances is costly. This paper analyzes the speed-up of batch (parallel) active learning compared to sequential active learning (where instances are chosen 1 by 1): how faster can an algorithm become if it can query λ instances at once?. There are two main contributions: proving lower and upper bounds on the possible gain, and illustrating them by experimenting on usual active learning algorithms. Roughly speaking, the speed-up is asymptotically logarithmic in the batch size λ (i.e. when λ → ∞). However, for some classes of functions with finite VC-dimension V, a linear speed-up can be achieved until a batch size of V. Practically speaking, this means that parallelizing computations on an expensive-to-label problem which is suited to active learning is very beneficial until V simultaneous queries, and less interesting (yet still bringing improvement) afterwards.