Algorithm selection as a bandit problem with unbounded losses

Authors:
Matteo Gagliolo;Jürgen Schmidhuber
Affiliations:
IDSIA, Manno, Lugano, Switzerland and University of Lugano, Faculty of Informatics, Lugano, Switzerland;IDSIA, Manno, Lugano, Switzerland and University of Lugano, Faculty of Informatics, Lugano, Switzerland
Venue:
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Year:
2010

Citing 26
Cited 2

The weighted majority algorithm

Information and Computation
Algorithm portfolios

Artificial Intelligence - special issue on computational tradeoffs under bounded resources
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems

Journal of Automated Reasoning
Local Search Algorithms for SAT: An Empirical Evaluation

Journal of Automated Reasoning
A perspective view and survey of meta-learning

Artificial Intelligence Review
Algorithm Selection using Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning the Empirical Hardness of Optimization Problems: The Case of Combinatorial Auctions

CP '02 Proceedings of the 8th International Conference on Principles and Practice of Constraint Programming
Dynamic restart policies

Eighteenth national conference on Artificial intelligence
Optimal schedules for parallelizing anytime algorithms: the case of independent processes

Eighteenth national conference on Artificial intelligence
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Introduction to the Special Issue on Meta-Learning

Machine Learning
Learning dynamic algorithm portfolios

Annals of Mathematics and Artificial Intelligence
Improved second-order bounds for prediction with expert advice

Machine Learning
Cross-disciplinary perspectives on meta-learning for algorithm selection

ACM Computing Surveys (CSUR)
An asymptotically optimal algorithm for the max k-armed bandit problem

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The max K-armed bandit: a new model of exploration applied to search heuristic selection

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Combining multiple heuristics online

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Optimal schedules for parallelizing anytime algorithms: the case of shared resources

Journal of Artificial Intelligence Research
Learning restart strategies

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A neural network model for inter-problem adaptive online time allocation

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Hannan consistency in on-line learning in case of unbounded losses under partial monitoring

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Combining multiple heuristics

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Diversification and determinism in local search for satisfiability

SAT'05 Proceedings of the 8th international conference on Theory and Applications of Satisfiability Testing
Improved second-order bounds for prediction with expert advice

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Efficient multi-start strategies for local search algorithms

Journal of Artificial Intelligence Research
Algorithm portfolio selection as a bandit problem with unbounded losses

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance model is iteratively updated and used to guide selection on a sequence of problem instances. The resulting exploration-exploitation trade-off was represented as a bandit problem with expert advice, using an existing solver for this game, but this required the setting of an arbitrary bound on algorithm runtimes, thus invalidating the optimal regret of the solver. In this paper, we propose a simpler framework for representing algorithm selection as a bandit problem, with partial information, and an unknown bound on losses. We adapt an existing solver to this game, proving a bound on its expected regret, which holds also for the resulting algorithm selection technique. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark.