Variable risk control via stochastic optimization

  • Authors:
  • Scott R Kuindersma;Roderic A Grupen;Andrew G Barto

  • Affiliations:
  • Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA ...;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, USA;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, USA

  • Venue:
  • International Journal of Robotics Research
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present new global and local policy search algorithms suitable for problems with policy-dependent cost variance (or risk), a property present in many robot control tasks. These algorithms exploit new techniques in non-parametric heteroscedastic regression to directly model the policy-dependent distribution of cost. For local search, the learned cost model can be used as a critic for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection permits variable risk control, where risk-sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without relearning. We describe experiments in dynamic stabilization and manipulation with a mobile manipulator that demonstrate learning of flexible, risk-sensitive policies in very few trials.