Variable risk control via stochastic optimization

Authors:
Scott R Kuindersma;Roderic A Grupen;Andrew G Barto
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA ...;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, USA;Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, USA
Venue:
International Journal of Robotics Research
Year:
2013

Citing 21
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Natural gradient works efficiently in learning

Neural Computation
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Regression with input-dependent noise: a Gaussian process treatment

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Gradient Convergence in Gradient methods with Errors

SIAM Journal on Optimization
A Taxonomy of Global Optimization Methods Based on Response Surfaces

Journal of Global Optimization
Risk-Sensitive Reinforcement Learning

Machine Learning
Q-Learning for Risk-Sensitive Control

Mathematics of Operations Research
Evolution of reinforcement learning in uncertain environments: a simple explanation for complex foraging behaviors

Adaptive Behavior
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Most likely heteroscedastic Gaussian process regression

Proceedings of the 24th international conference on Machine learning
Using Gaussian Processes to Optimize Expensive Functions

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Automatic gait optimization with Gaussian process regression

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Robot weightlifting by direct policy search

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Bayesian optimization for sensor set selection

Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks
From Motor Learning to Interaction Learning in Robots

From Motor Learning to Interaction Learning in Robots
Whole-body strategies for mobility and manipulation

Whole-body strategies for mobility and manipulation
Convergence Rates of Efficient Global Optimization Algorithms

The Journal of Machine Learning Research
Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks

IEEE Transactions on Neural Networks
An experimental methodology for response surface optimization methods

Journal of Global Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present new global and local policy search algorithms suitable for problems with policy-dependent cost variance (or risk), a property present in many robot control tasks. These algorithms exploit new techniques in non-parametric heteroscedastic regression to directly model the policy-dependent distribution of cost. For local search, the learned cost model can be used as a critic for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection permits variable risk control, where risk-sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without relearning. We describe experiments in dynamic stabilization and manipulation with a mobile manipulator that demonstrate learning of flexible, risk-sensitive policies in very few trials.