TEXPLORE: real-time sample-efficient reinforcement learning for robots

Authors:
Todd Hester;Peter Stone
Affiliations:
Department of Computer Science, The University of Texas at Austin, Austin, USA 78712;Department of Computer Science, The University of Texas at Austin, Austin, USA 78712
Venue:
Machine Learning
Year:
2013

Citing 40
Cited 0

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
Efficient model-based exploration

Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Random Forests

Machine Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Variable Resolution Discretization in Optimal Control

Machine Learning
Induction of Decision Trees

Machine Learning
Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Model-based function approximation in reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Knows what it knows: a framework for self-aware learning

Proceedings of the 25th international conference on Machine learning
Sample-based learning and search with permanent and transient memories

Proceedings of the 25th international conference on Machine learning
Parallel Monte-Carlo Tree Search

CG '08 Proceedings of the 6th international conference on Computers and Games
Learning and planning in environments with delayed feedback

Autonomous Agents and Multi-Agent Systems
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Efficient reinforcement learning with relocatable action models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient structure learning in factored-state MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
A sparse sampling algorithm for near-optimal planning in large Markov decision processes

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning to act using real-time dynamic programming

Artificial Intelligence
RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments

The Journal of Machine Learning Research
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Exploring compact reinforcement-learning representations with linear regression

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Policy search for motor primitives in robotics

Machine Learning
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Temporal-difference search in computer Go

Machine Learning
Intrinsic Motivation Systems for Autonomous Mental Development

IEEE Transactions on Evolutionary Computation
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations online. RL is a paradigm for learning sequential decision making tasks, usually formulated as a Markov Decision Process (MDP). For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays, and continuous state features. In this article, we present texplore, the first algorithm to address all of these challenges together. texplore is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. The agent explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, texplore can select actions continually in real-time whenever necessary. We empirically evaluate the importance of each component of texplore in isolation and then demonstrate the complete algorithm learning to control the velocity of an autonomous vehicle in real-time.