Least absolute policy iteration for robust value function approximation

Authors:
Masashi Sugiyama;Hirotaka Hachiya;Hisashi Kashima;Tetsuro Morimura
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, Japan;Department of Computer Science, Tokyo Institute of Technology, Japan;IBM Research, Tokyo Research Laboratory, Japan;IBM Research, Tokyo Research Laboratory, Japan
Venue:
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Year:
2009

Citing 11
Cited 1

Robust regression and outlier detection

Robust regression and outlier detection
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Linear Programming Boosting via Column Generation

Machine Learning
Risk-Sensitive Reinforcement Learning

Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Robust Reinforcement Learning

Neural Computation
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
Robust Reinforcement Learning Control Using Integral Quadratic Constraints for Recurrent Neural Networks

IEEE Transactions on Neural Networks

Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through simulated robot-control tasks.