Automatic construction of temporally extended actions for MDPs using bisimulation metrics

Authors:
Pablo Samuel Castro;Doina Precup
Affiliations:
School of Computer Science, McGill University, Canada;School of Computer Science, McGill University, Canada
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 22
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning Options in Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Temporal abstraction in reinforcement learning

Temporal abstraction in reinforcement learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Dynamic abstraction in reinforcement learning via clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Causal Graph Based Decomposition of Factored MDPs

The Journal of Machine Learning Research
Automatic discovery and transfer of MAXQ hierarchies

Proceedings of the 25th international conference on Machine learning
Discovering options from example trajectories

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Transfer via soft homomorphisms

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Using Homomorphisms to transfer options across continuous reinforcement learning domains

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Autonomously learning an action hierarchy using a learned qualitative state representation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Optimal policy switching algorithms for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Using bisimulation for policy transfer in MDPs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options [24], in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric [7] between the states in a small MDP and the states in a large MDP, which we want to solve. The shape of this metric is then used to completely define a set of options for the large MDP. We demonstrate empirically that our approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.