The weighted majority algorithm
Information and Computation
Associative Reinforcement Learning: Functions in k-DNF
Machine Learning
Associative Reinforcement Learning: A Generate and Test Algorithm
Machine Learning
Machine Learning - Special issue on context sensitivity and concept drift
Machine Learning - Special issue on context sensitivity and concept drift
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Associative Reinforcement Learning using Linear Probabilistic Concepts
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Optimally Schedule Internet Banner Advertisements
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Using upper confidence bounds for online learning
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Regret Minimization Under Partial Monitoring
Mathematics of Operations Research
An analysis of model-based Interval Estimation for Markov Decision Processes
Journal of Computer and System Sciences
Approximation algorithms for restless bandit problems
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Robust bounds for classification via selective sampling
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The offset tree for learning with partial labels
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Exploitation and exploration in a performance based contextual advertising system
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Linearly Parameterized Bandits
Mathematics of Operations Research
Approximation algorithms for restless bandit problems
Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Characterising enzymes for information processing: towards an artificial experimenter
UC'10 Proceedings of the 9th international conference on Unconventional computation
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression
Annals of Mathematics and Artificial Intelligence
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
Regret Bounds and Minimax Policies under Partial Monitoring
The Journal of Machine Learning Research
Revisiting Monte-Carlo tree search on a normal form game: NoGo
EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A Monte-Carlo AIXI approximation
Journal of Artificial Intelligence Research
On upper-confidence bound policies for switching bandit problems
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Value-difference based exploration: adaptive control between epsilon-greedy and softmax
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
Continuous upper confidence trees
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Learning with stochastic inputs and adversarial outputs
Journal of Computer and System Sciences
The K-armed dueling bandits problem
Journal of Computer and System Sciences
Optimistic Bayesian sampling in contextual-bandit problems
The Journal of Machine Learning Research
LogUCB: an explore-exploit algorithm for comments recommendation
Proceedings of the 21st ACM international conference on Information and knowledge management
Adaptive exploration using stochastic neurons
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Partial monitoring with side information
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Gradient algorithms for exploration/exploitation trade-offs: global and local variants
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Upper confidence tree-based consistent reactive planning application to minesweeper
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
IEEE/ACM Transactions on Networking (TON)
Directing exploratory search: reinforcement learning from user interactions with keywords
Proceedings of the 2013 international conference on Intelligent user interfaces
Non stationary operator selection with island models
Proceedings of the 15th annual conference on Genetic and evolutionary computation
A unified search federation system based on online user feedback
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixing bandits: a recipe for improved cold-start recommendations in a social network
Proceedings of the 7th Workshop on Social Network Mining and Analysis
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Directing exploratory search with interactive intent modeling
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Interactive collaborative filtering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Natural Computing: an international journal
Hi-index | 0.00 |
We show how a standard tool from statistics --- namely confidence bounds --- can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We apply our technique to two models with such an exploitation-exploration trade-off. For the adversarial bandit problem with shifting our new algorithm suffers only O((ST)1/2) regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from O(T3/4) to O(T1/2).