Finding and transferring policies using stored behaviors

Authors:
Martin Stolle;Christopher Atkeson
Affiliations:
, Pittsburgh, USA 15213;, Pittsburgh, USA 15213
Venue:
Autonomous Robots
Year:
2010

Citing 27
Cited 0

Heuristics: intelligent search strategies for computer problem solving

Heuristics: intelligent search strategies for computer problem solving
Enhancing transfer in reinforcement learning by building stochastic models of robot actions

ML92 Proceedings of the ninth international workshop on Machine learning
Learning by analogical reasoning in general problem-solving

Learning by analogical reasoning in general problem-solving
Deliberation scheduling for problem solving in time-constrained environments

Artificial Intelligence
A data intensive computing approach to path planning and mode management for hybrid systems

Proceedings of the DIMACS/SYCON workshop on Hybrid systems III : verification and control: verification and control
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Interactive control of avatars animated with human motion data

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Variable Resolution Discretization in Optimal Control

Machine Learning
A Heuristic Approach to the Discovery of Macro-Operators

Machine Learning
Chunking in Soar: The Anatomy of a General Learning Mechanism

Machine Learning
Learning Options in Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Autonomous discovery of temporal abstractions from interaction with an environment

Autonomous discovery of temporal abstractions from interaction with an environment
Using relative novelty to identify useful temporal abstractions in reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic abstraction in reinforcement learning via clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Evaluating motion graphs for character navigation

SCA '04 Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
Learning from observation using primitives

Learning from observation using primitives
Behavior planning for character animation

Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Planning Algorithms

Planning Algorithms
Optimal Rough Terrain Trajectory Generation for Wheeled Mobile Robots

International Journal of Robotics Research
Learning Control Knowledge for Forward Search Planning

The Journal of Machine Learning Research
Finding and transferring policies using stored behaviors

Finding and transferring policies using stored behaviors
A multirange architecture for collision-free off-road robot navigation

Journal of Field Robotics - Special Issue on LAGR Program, Part I
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present several algorithms that aim to advance the state-of-the-art in reinforcement learning and planning algorithms. One key idea is to transfer knowledge across problems by representing it using local features. This idea is used to speed up a dynamic programming based generalized policy iteration.We then present a control approach that uses a library of trajectories to establish a control law or policy. This approach is an alternative to methods for finding policies based on value functions using dynamic programming and also to using plans based on a single desired trajectory. Our method has the advantages of providing reasonable policies much faster than dynamic programming and providing more robust and global policies than following a single desired trajectory.Finally we show how local features can be used to transfer libraries of trajectories between similar problems. Transfer makes it useful to store special purpose behaviors in the library for solving tricky situations in new environments. By adapting the behaviors in the library, we increase the applicability of the behaviors. Our approach can be viewed as a method that allows planning algorithms to make use of special purpose behaviors/actions which are only applicable in certain situations.Results are shown for the "Labyrinth" marble maze and the Little Dog quadruped robot. The marble maze is a difficult task which requires both fast control as well as planning ahead. In the Little Dog terrain, a quadruped robot has to navigate quickly across rough terrain.