Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Authors:
Nicolas Meuleau;Paul Bourgine
Affiliations:
Computer Science Department, Box 1910, Brown University, Providence, RI 02912, USA. nm@cs.brown.edu;Ecole Polytechnique, CREA, Route de Saclay, F-91128 Palaiseau cedex, France. bourgine@poly.polytechnique.fr
Venue:
Machine Learning
Year:
1999

Citing 16
Cited 18

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Integrated modeling and control based on reinforcement learning and dynamic programming

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Dyna, an integrated architecture for learning, planning, and reacting

ACM SIGART Bulletin
Learning to Perceive and Act by Trial and Error

Machine Learning
Technical Note: \cal Q-Learning

Machine Learning
Learning in embedded systems

Learning in embedded systems
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Introduction

Machine Learning - Special issue on reinforcement learning
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction: The Challenge of Reinforcement Learning

Machine Learning
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Variable Resolution Discretization in Optimal Control

Machine Learning
Characterizing Markov Decision Processes

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
The many faces of optimism: a unifying approach

Proceedings of the 25th international conference on Machine learning
Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

Neurocomputing
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving optimistic exploration in model-free reinforcement learning

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
A dynamic programming strategy to balance exploration and exploitation in the bandit problem

Annals of Mathematics and Artificial Intelligence
An adaptive approach for the exploration-exploitation dilemma for learning agents

CEEMAS'05 Proceedings of the 4th international Central and Eastern European conference on Multi-Agent Systems and Applications
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning
An adaptive approach for the exploration-exploitation dilemma and its application to economic systems

LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Optimistic Bayesian sampling in contextual-bandit problems

The Journal of Machine Learning Research
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Prior-free exploration bonus for and beyond near bayes-optimal behavior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an action selection technique forreinforcement learning in stationary Markovian environments. Thistechnique may be used in direct algorithms such as Q-learning, or inindirect algorithms such as adaptive dynamic programming. It is basedon two principles. The first is to define a local measure of theuncertainty using the theory of bandit problems. We show that such ameasure suffers from several drawbacks. In particular, a directapplication of it leads to algorithms of low quality that can beeasily misled by particular configurations of the environment. Thesecond basic principle was introduced to eliminate this drawback. Itconsists of assimilating the local measures of uncertainty torewards, and back-propagating them with the dynamic programming ortemporal difference mechanisms. This allows reproducing global-scalereasoning about the uncertainty, using only local measures of it.Numerical simulations clearly show the efficiency of thesepropositions.