Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Authors:
Csaba Szepesv\''ari;Michael L. Littman
Affiliations:
-;-
Venue:
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Year:
1996

Citing 0
Cited 14

Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Reinforcement Learning Agents

Artificial Intelligence Review
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Application of Markov chains in an interactive information retrieval system

Information Processing and Management: an International Journal
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Neural Computation
Heuristic Reinforcement Learning Applied to RoboCup Simulation Agents

RoboCup 2007: Robot Soccer World Cup XI
Optimistic-Pessimistic Q-Learning Algorithm for Multi-Agent Systems

MATES '08 Proceedings of the 6th German conference on Multiagent System Technologies
Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Improving Reinforcement Learning by Using Case Based Heuristics

ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Heuristic selection of actions in multiagent reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Relational reinforcement learning applied to shared attention

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Heuristic Q-learning soccer players: a new reinforcement learning approach to RoboCup simulation

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of maximizing the expected total discounted reward in a completely observable Markovian environment, i.e., a Markov decision process (MDP), models a particular class of sequential decision problems. Algorithms have been developed for making optimal decisions in MDPs given either an MDP specification or the opportunity to interact with the MDP over time. Recently, other sequential decision-making problems have been studied prompting the development of new algorithms and analyses. We describe a new generalized model that subsumes MDPs as well as many of the recent variations. We prove some basic results concerning this model and develop generalizations of value iteration, policy iteration, model-based reinforcement-learning, and Q-learning that can be used to make optimal decisions in the generalized model under various assumptions. Applications of the theory to particular models are described, including risk-averse MDPs, exploration-sensitive MDPs, sarsa, Q-learning with spreading, two-player games, and approximate max picking via sampling. Central to the results are the contraction property of the value operator and a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence.