REINFORCEMENT LEARNING FOR POMDP USING STATE CLASSIFICATION

Authors:
Le Tien Dung;Takashi Komeda;Motoki Takagi
Affiliations:
Graduate School of Engineering, Shibaura Institute of Technology, Saitama, Japan;Faculty of System Engineering, Shibaura Institute of Technology, Saitama, Japan;Faculty of System Engineering, Shibaura Institute of Technology, Saitama, Japan
Venue:
Applied Artificial Intelligence
Year:
2008

Citing 9
Cited 1

Reinforcement learning in Markovian and non-Markovian environments

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
Reinforcement learning of non-Markov decision processes

Artificial Intelligence - Special volume on computational research on interaction and agency, part 2
Long short-term memory

Neural Computation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning precise timing with lstm recurrent networks

The Journal of Machine Learning Research
Learning to Forget: Continual Prediction with LSTM

Neural Computation
Training Recurrent Networks by Evolino

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Learning resources in federated environments: a broken link checker based on URL similarity

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov decision processes (MDPs) quite well. For partially observable Markov decision processes (POMDPs), a recurrent neural network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. We present a new combination of RL and RNN to find a good policy for POMDPs in a shorter learning time. This method contains two phases: firstly, state space is divided into two groups (fully observable state group and hidden state group); secondly, a Q value table is used to store values of fully observable states and an RNN is used to approximate values for hidden states. Results of experiments in two grid world problems show that the proposed method enables an agent to acquire a policy with better learning performance compared to the method using only a RNN.