Dynamic abstraction in reinforcement learning via clustering

Authors:
Shie Mannor;Ishai Menache;Amit Hoze;Uri Klein
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Technion, Israel;Technion, Israel;Technion, Israel
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 14
Cited 25

Algorithms for clustering data

Algorithms for clustering data
Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
Learning hierarchical control structures for multiple tasks and changing environments

Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning
An intrinsic reward mechanism for efficient exploration

ICML '06 Proceedings of the 23rd international conference on Machine learning
Causal Graph Based Decomposition of Factored MDPs

The Journal of Machine Learning Research
The utility of temporal abstraction in reinforcement learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving

MATES '07 Proceedings of the 5th German conference on Multiagent System Technologies
Discovering options from example trajectories

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Using Strongly Connected Components as a Basis for Autonomous Skill Acquisition in Reinforcement Learning

ISNN '09 Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural Networks
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Samuel meets Amarel: automating value function approximation using global state space analysis

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Towards competence in autonomous agents

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
State similarity based approach for improving performance in RL

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic abstraction in reinforcement learning using data mining techniques

Robotics and Autonomous Systems
State abstraction discovery from irrelevant state variables

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Bayesian network-based behavior control for Skilligent robots

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Learning of subgoals for goal-oriented behavior control of mobile robots

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Automatic discovery of subgoals in reinforcement learning using strongly connected components

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Finding and transferring policies using stored behaviors

Autonomous Robots
Automatic discovery of subgoals based on improved FCM clustering

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
Learning skills in reinforcement learning using relative novelty

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Automatic construction of temporally extended actions for MDPs using bisimulation metrics

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Transfer in reinforcement learning via shared features

The Journal of Machine Learning Research
Beyond reward: the problem of knowledge and data

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Study of similar motion imitation of stepping upstairs for humanoid robot

International Journal of Computing Science and Mathematics
Automatic skill acquisition in reinforcement learning using graph centrality measures

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated on-line by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the state space to different regions. Policies for reaching the different parts of the space are separately learned and added to the model in a form of options (macro-actions). The options are used for accelerating the Q-Learning algorithm. We extend the basic algorithm and consider building a map that includes preliminary indication of the location of "interesting" regions of the state space, where the value gradient is significant and additional exploration might be beneficial. Experiments indicate significant speedups, especially in the initial learning phase.