Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

Authors:
Philippe Rolet;Michèle Sebag;Olivier Teytaud
Affiliations:
TAO, CNRS - INRIA - Univ. Paris-Sud;TAO, CNRS - INRIA - Univ. Paris-Sud;TAO, CNRS - INRIA - Univ. Paris-Sud
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Year:
2009

Citing 26
Cited 4

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Bayesian interpolation

Neural Computation
Active Learning Using Arbitrary Binary Valued Queries

Machine Learning
Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension

Machine Learning - Special issue on computational learning theory
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Generalized teaching dimensions and the query complexity of learning

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Playing billiards in version space

Neural Computation
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Large margin classification using the perceptron algorithm

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems

A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Dynamic Programming

Dynamic Programming
Bayes point machines

The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Selective Sampling for Nearest Neighbor Classifiers

Machine Learning
Batch mode active learning and its application to medical image classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
A bound on the label complexity of agnostic active learning

Proceedings of the 24th international conference on Machine learning
Software testing by active learning for commercial games

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Active learning with statistical models

Journal of Artificial Intelligence Research
Margin based active learning

COLT'07 Proceedings of the 20th annual conference on Learning theory
Efficient selectivity and backup operators in Monte-Carlo tree search

CG'06 Proceedings of the 5th international conference on Computers and games
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Analysis of perceptron-based active learning

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Sparse gradient-based direct policy search

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
Pilot, rollout and monte carlo tree search methods for job shop scheduling

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Upper confidence tree-based consistent reactive planning application to minesweeper

LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Using reinforcement learning to find an optimal set of features

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on Active Learning with a limited number of queries; in application domains such as Numerical Engineering, the size of the training set might be limited to a few dozen or hundred examples due to computational constraints. Active Learning under bounded resources is formalized as a finite horizon Reinforcement Learning problem, where the sampling strategy aims at minimizing the expectation of the generalization error. A tractable approximation of the optimal (intractable) policy is presented, the Bandit-based Active Learner (BAAL ) algorithm. Viewing Active Learning as a single-player game, BAAL combines UCT, the tree structured multi-armed bandit algorithm proposed by Kocsis and Szepesvári (2006), and billiard algorithms. A proof of principle of the approach demonstrates its good empirical convergence toward an optimal policy and its ability to incorporate prior AL criteria. Its hybridization with the Query-by-Committee approach is found to improve on both stand-alone BAAL and stand-alone QbC.