Discovering Hierarchy in Reinforcement Learning with HEXQ
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
The MAXQ Method for Hierarchical Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Options in Reinforcement Learning
Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Using relative novelty to identify useful temporal abstractions in reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
A survey of robot learning from demonstration
Robotics and Autonomous Systems
Regularization and feature selection in least-squares temporal difference learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Discovering options from example trajectories
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
State abstraction discovery from irrelevant state variables
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Combining manual feedback with subsequent MDP reward signals for reinforcement learning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Automatic state abstraction from demonstration
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hi-index | 0.00 |
Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge. We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several subtasks, finding state abstractions for each one of these subtasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes.