Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Model Minimization in Hierarchical Reinforcement Learning
Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Speeding up learning in real-time search via automatic state abstraction
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Building portable options: skill transfer in reinforcement learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Effective control knowledge transfer through learning skill and representation hierarchies
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
State abstraction discovery from irrelevant state variables
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Hi-index | 0.00 |
A reinforcement learning (RL) agent that performs successfully in a complex and dynamic environment has to continuously learn and adapt to perform new tasks. This necessitates for them to not only extract control and representation knowledge from the tasks learned, but also to reuse the extracted knowledge to learn new tasks. This paper presents a new method to extract this control and representational knowledge. Here we present a policy generalization approach that uses the novel concept of policy homomorphism to derive these abstractions. The paper further extends the policy homomorphism framework to an approximate policy. The extension allows policy generalization framework to efficiently address more realistic tasks and environments in nondeterministic domains. The approximate policy homomorphism derives an abstract policy for a set of similar tasks (a task type) from a set of basic policies learned for previously seen task instances. The resulting generalized policy is then applied in new contexts to address new instances of related tasks. The approach also allows to identify similar tasks based on the functional characteristics of the corresponding skills and provides a means of transferring the learned knowledge to new situations without the need for complete knowledge of the state space and the system dynamics in the new environment. We demonstrate the working of policy abstraction using approximate policy homomorphism and illustrate policy reuse to learn new tasks in novel situations using a set of grid world examples.