Reinforcement learning for POMDPs based on action values and stochastic optimization

Authors:
Theodore J. Perkins
Affiliations:
Department of Computer Science, University of Massachusetts Amherst, 140 Governor's Drive, Amherst, MA
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 14
Cited 6

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Reinforcement learning for the adaptive control of perception and action

Reinforcement learning for the adaptive control of perception and action
Learning in embedded systems

Learning in embedded systems
Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
PALO: a probabilistic hill-climbing algorithm

Artificial Intelligence
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The Sample Average Approximation Method for Stochastic Discrete Optimization

SIAM Journal on Optimization
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Improving the performance of complex agent plans through reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
LearnPNP: a tool for learning agent behaviors

RoboCup 2010
Reinforcement learning through global stochastic search in N-MDPs

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
The thing that we tried didn't work very well: deictic representation in reinforcement learning

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Induction and learning of finite-state controllers from simulation

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Policy oscillation is overshooting

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new, model-free reinforcement learning algorithm for learning to control partially-observable Markov decision processes. The algorithm incorporates ideas from action-value based reinforcement learning approaches, such as Q-Learning, as well as ideas from the stochastic optimization literature. Key to our approach is a new definition of action value, which makes the algorithm theoretically sound for partially-observable settings. We show that special cases of our algorithm can achieve probability one convergence to locally optimal policies in the limit, or probably approximately correct hill-climbing to a locally optimal policy in a finite number of samples.