Using inaccurate models in reinforcement learning

Authors:
Pieter Abbeel;Morgan Quigley;Andrew Y. Ng
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 10
Cited 16

Optimal control: linear quadratic methods

Optimal control: linear quadratic methods
Robust and optimal control

Robust and optimal control
Locally Weighted Learning for Control

Artificial Intelligence Review - Special issue on lazy learning
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Robot Learning From Demonstration

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Acquisition of Stand-up Behavior by a Real Robot using Hierarchical Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Using inaccurate models in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Using inaccurate models in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonlinear dynamics modelling for controller evolution

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Artificial Intelligence techniques: An introduction to their use for modelling environmental systems

Mathematics and Computers in Simulation
Learning for control from multiple demonstrations

Proceedings of the 25th international conference on Machine learning
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
Adaptive Optimal Control for Redundantly Actuated Arms

SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
Probabilistic Inference for Fast Learning in Control

Recent Advances in Reinforcement Learning
Apprenticeship learning for helicopter control

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
A reward field model generation in Q-learning by dynamic programming

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Comparing apples and oranges through partial orders: an empirical approach

ACC'09 Proceedings of the 2009 conference on American Control Conference
Autonomous Helicopter Aerobatics through Apprenticeship Learning

International Journal of Robotics Research
Integrating a partial model into model free reinforcement learning

The Journal of Machine Learning Research
Humanoid robots learning to walk faster: from the real world to simulation and back

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Autonomously learning to visually detect where manipulation will succeed

Autonomous Robots

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.