Space-indexed dynamic programming: learning to follow trajectories

Authors:
J. Zico Kolter;Adam Coates;Andrew Y. Ng;Yi Gu;Charles DuHadway
Affiliations:
Stanford University, CA;Stanford University, CA;Stanford University, CA;Stanford University, CA;Stanford University, CA
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 6
Cited 2

Model predictive control: theory and practice—a survey

Automatica (Journal of IFAC)
Optimal control: linear quadratic methods

Optimal control: linear quadratic methods
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Feedback Control of Dynamic Systems

Feedback Control of Dynamic Systems
Modern Control Systems

Modern Control Systems
Perceptron and SVM learning with generalized cost models

Intelligent Data Analysis

Contact-aware nonlinear control of dynamic characters

ACM SIGGRAPH 2009 papers
Reinforcement learning in robotics: A survey

International Journal of Robotics Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the task of learning to accurately follow a trajectory in a vehicle such as a car or helicopter. A number of dynamic programming algorithms such as Differential Dynamic Programming (DDP) and Policy Search by Dynamic Programming (PSDP), can efficiently compute non-stationary policies for these tasks --- such policies in general are well-suited to trajectory following since they can easily generate different control actions at different times in order to follow the trajectory. However, a weakness of these algorithms is that their policies are time-indexed, in that they apply different policies depending on the current time. This is problematic since 1) the current time may not correspond well to where we are along the trajectory and 2) the uncertainty over states can prevent these algorithms from finding any good policies at all. In this paper we propose a method for space-indexed dynamic programming that overcomes both these difficulties. We begin by showing how a dynamical system can be rewritten in terms of a spatial index variable (i.e., how far along the trajectory we are) rather than as a function of time. We then use these space-indexed dynamical systems to derive space-indexed version of the DDP and PSDP algorithms. Finally, we show that these algorithms perform well on a variety of control tasks, both in simulation and on real systems.