System identification based on online variational bayes method and its application to reinforcement learning

Authors:
Junichiro Yoshimoto;Shin Ishii;Masa-aki Sato
Affiliations:
CREST, Japan Science and Technology Corporation and Nara Institute of Science and Technology, Ikoma, Nara, Japan;CREST, Japan Science and Technology Corporation and Nara Institute of Science and Technology, Ikoma, Nara, Japan;CREST, Japan Science and Technology Corporation and ATR Human Information Science Laboratories, Soraku, Kyoto, Japan
Venue:
ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Year:
2003

Citing 3
Cited 4

A unifying review of linear Gaussian models

Neural Computation
Actor-critic algorithms

Actor-critic algorithms
Online Model Selection Based on the Variational Bayes

Neural Computation

Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Neural Computation
Reinforcement learning for cooperative actions in a partially observable multi-agent system

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
An off-policy natural policy gradient method for a partial observable Markov decision process

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Feature extraction for decision-theoretic planning in partially observable environments

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this article, we present an on-line variational Bayes (VB) method for the identification of linear state space models. The learning algorithm is implemented as alternate maximization of an on-line free energy, which can be used for determining the dimension of the internal state. We also propose a reinforcement learning (RL) method using this system identification method. Our RL method is applied to a simple automatic control problem. The result shows that our method is able to determine correctly the dimension of the internal state and to acquire a good control, even in a partially observable environment.