A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Authors:
Lei Zheng;Siu-Yeung Cho
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Venue:
Neural Processing Letters
Year:
2011

Citing 19
Cited 1

The complexity of Markov decision processes

Mathematics of Operations Research
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Trading Off Perception with Internal State: Reinforcement Learning and Analysis of Q-Elman Networks in a Markovian Task

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 3 - Volume 3
Dynamic Programming

Dynamic Programming
Learning and discovery of predictive state representations in dynamical systems with reset

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Solving non-Markovian control tasks with neuroevolution

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Model-based online learning of POMDPs

ECML'05 Proceedings of the 16th European conference on Machine Learning

Online expectation maximization for reinforcement learning in POMDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.