Learning to generalize and reuse skills using approximate partial policy homomorphisms

Authors:
Srividhya Rajendran;Manfred Huber
Affiliations:
Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, Texas;Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, Texas
Venue:
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Year:
2009

Citing 8
Cited 0

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Model Minimization in Hierarchical Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Speeding up learning in real-time search via automatic state abstraction

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Building portable options: skill transfer in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Effective control knowledge transfer through learning skill and representation hierarchies

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
State abstraction discovery from irrelevant state variables

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A reinforcement learning (RL) agent that performs successfully in a complex and dynamic environment has to continuously learn and adapt to perform new tasks. This necessitates for them to not only extract control and representation knowledge from the tasks learned, but also to reuse the extracted knowledge to learn new tasks. This paper presents a new method to extract this control and representational knowledge. Here we present a policy generalization approach that uses the novel concept of policy homomorphism to derive these abstractions. The paper further extends the policy homomorphism framework to an approximate policy. The extension allows policy generalization framework to efficiently address more realistic tasks and environments in nondeterministic domains. The approximate policy homomorphism derives an abstract policy for a set of similar tasks (a task type) from a set of basic policies learned for previously seen task instances. The resulting generalized policy is then applied in new contexts to address new instances of related tasks. The approach also allows to identify similar tasks based on the functional characteristics of the corresponding skills and provides a means of transferring the learned knowledge to new situations without the need for complete knowledge of the state space and the system dynamics in the new environment. We demonstrate the working of policy abstraction using approximate policy homomorphism and illustrate policy reuse to learn new tasks in novel situations using a set of grid world examples.