Proceedings of the seventh international conference (1990) on Machine learning
Integrated modeling and control based on reinforcement learning and dynamic programming
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Dyna, an integrated architecture for learning, planning, and reacting
ACM SIGART Bulletin
Learning to Perceive and Act by Trial and Error
Machine Learning
Technical Note: \cal Q-Learning
Machine Learning
Learning in embedded systems
Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Machine Learning - Special issue on reinforcement learning
Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction: The Challenge of Reinforcement Learning
Machine Learning
Efficient Exploration In Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Variable Resolution Discretization in Optimal Control
Machine Learning
Characterizing Markov Decision Processes
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
The many faces of optimism: a unifying approach
Proceedings of the 25th international conference on Machine learning
Accelerating reinforcement learning through implicit imitation
Journal of Artificial Intelligence Research
Using linear programming for Bayesian exploration in Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving optimistic exploration in model-free reinforcement learning
ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Annals of Mathematics and Artificial Intelligence
An adaptive approach for the exploration-exploitation dilemma for learning agents
CEEMAS'05 Proceedings of the 4th international Central and Eastern European conference on Multi-Agent Systems and Applications
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Optimistic Bayesian sampling in contextual-bandit problems
The Journal of Machine Learning Research
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Prior-free exploration bonus for and beyond near bayes-optimal behavior
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
This paper presents an action selection technique forreinforcement learning in stationary Markovian environments. Thistechnique may be used in direct algorithms such as Q-learning, or inindirect algorithms such as adaptive dynamic programming. It is basedon two principles. The first is to define a local measure of theuncertainty using the theory of bandit problems. We show that such ameasure suffers from several drawbacks. In particular, a directapplication of it leads to algorithms of low quality that can beeasily misled by particular configurations of the environment. Thesecond basic principle was introduced to eliminate this drawback. Itconsists of assimilating the local measures of uncertainty torewards, and back-propagating them with the dynamic programming ortemporal difference mechanisms. This allows reproducing global-scalereasoning about the uncertainty, using only local measures of it.Numerical simulations clearly show the efficiency of thesepropositions.