An intrinsic reward mechanism for efficient exploration

Authors:
Özgür Şimşek;Andrew G. Barto
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 13
Cited 11

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
A possibility for implementing curiosity and boredom in model-building neural controllers

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Efficient learning and planning within the Dyna framework

Adaptive Behavior
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering Hierarchy in Reinforcement Learning with HEXQ

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
Using relative novelty to identify useful temporal abstractions in reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic abstraction in reinforcement learning via clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning

Modeling motivation for adaptive nonplayer characters in dynamic computer game worlds

Computers in Entertainment (CIE) - Theoretical and Practical Computer Applications in Entertainment
Efficient Continuous-Time Reinforcement Learning with Adaptive State Graphs

ECML '07 Proceedings of the 18th European conference on Machine Learning
Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

Web Intelligence and Agent Systems
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Linear options

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
A world survey of artificial brain projects, Part II: Biologically inspired cognitive architectures

Neurocomputing
Review: learning like a baby: A survey of artificial intelligence approaches

The Knowledge Engineering Review
2012 Special Issue: Hierarchical curiosity loops and active sensing

Neural Networks
Active learning of MDP models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Active learning of inverse models with intrinsically motivated goal exploration in robots

Robotics and Autonomous Systems
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm's use for learning a policy for a skill given its reward function---an important but neglected component of skill discovery.