Active reinforcement learning

Authors:
Arkady Epshteyn;Adam Vogel;Gerald DeJong
Affiliations:
Google Inc., Pittsburgh, PA;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 11
Cited 3

Bounded-parameter Markov decision process

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
Complexity results for infinite-horizon markov decision processes

Complexity results for infinite-horizon markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Using inaccurate models in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Machine learning for interactive systems and robots: a brief introduction

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.