Efficient Continuous-Time Reinforcement Learning with Adaptive State Graphs

Authors:
Gerhard Neumann;Michael Pfeiffer;Wolfgang Maass
Affiliations:
Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria;Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria;Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 6
Cited 0

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Kernel-Based Reinforcement Learning

Machine Learning
Robust Combination of Local Controllers

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
An intrinsic reward mechanism for efficient exploration

ICML '06 Proceedings of the 23rd international conference on Machine learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new reinforcement learning approach for deterministic continuous control problems in environments with unknown, arbitrary reward functions. The difficulty of finding solution trajectories for such problems can be reduced by incorporating limited prior knowledge of the approximative local system dynamics. The presented algorithm builds an adaptive state graph of sample points within the continuous state space. The nodes of the graph are generated by an efficient principled exploration scheme that directs the agent towards promising regions, while maintaining good online performance. Global solution trajectories are formed as combinations of local controllers that connect nodes of the graph, thereby naturally allowing continuous actions and continuous time steps. We demonstrate our approach on various movement planning tasks in continuous domains.