Technical Note: \cal Q-Learning
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
PAC adaptive control of linear systems
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Learning in Neural Networks: Theoretical Foundations
Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
The Journal of Machine Learning Research
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Finite time bounds for sampling based fitted value iteration
ICML '05 Proceedings of the 22nd international conference on Machine learning
An adaptive algorithm for selecting profitable keywords for search-based advertising services
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
An incentive-compatible multi-armed bandit mechanism
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces
Discrete Event Dynamic Systems
Proceedings of the 25th international conference on Machine learning
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
An analysis of model-based Interval Estimation for Markov Decision Processes
Journal of Computer and System Sciences
Computational modelling of switching behaviour in repeated gambles
Artificial Intelligence Review
Adaptive Incentive-Compatible Sponsored Search Auction
SOFSEM '09 Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science
An adaptive sponsored search mechanism δ-gain truthful in valuation, time, and budget
WINE'07 Proceedings of the 3rd international conference on Internet and network economics
Pure exploration in multi-armed bandits problems
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Pure exploration in finitely-armed and continuous-armed bandits
Theoretical Computer Science
Learning to trade off between exploration and exploitation in multiclass bandit prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Knowledge Gradient for Sequential Sampling
The Journal of Machine Learning Research
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
DCOPs and bandits: exploration and exploitation in decentralised coordination
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
A contextual-bandit algorithm for mobile context-aware recommender system
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Exploration / exploitation trade-off in mobile context-aware recommender systems
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Sample complexity of risk-averse bandit-arm selection
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O(n/驴2 log 1/驴) times to find an 驴-optimal arm with probability of at least 1 - 驴. This is in contrast to the naive bound of O(n/驴2 log n/驴). We derive another algorithm whose complexity depends on the specific setting of the rewards, rather than the worst case setting. We also provide a matching lower bound. We show how given an algorithm for the PAC model Multi-armed Bandit problem, one can derive a batch learningalg orithm for Markov Decision Processes. This is done essentially by simulatingV alue Iteration, and in each iteration invokingt he multi-armed bandit algorithm. Using our PAC algorithm for the multi-armed bandit problem we improve the dependence on the number of actions.