Reinforcement Learning State Estimator

Authors:
Jun Morimoto;Kenji Doya
Affiliations:
JST, ICORP, Computational Brain Project, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan, xmorimo@atr.jp;Initial Research Project, Okinawa Institute of Science and Technology, Uruma, Okinawa 904-2234, Japan, doya@irp.oist.jp
Venue:
Neural Computation
Year:
2007

Citing 8
Cited 0

Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
On sequential Monte Carlo sampling methods for Bayesian filtering

Statistics and Computing
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
Reinforcement Learning in Continuous Time and Space

Neural Computation
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
A tutorial on particle filters for online nonlinear/non-GaussianBayesian tracking

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented.