A Generalized Reinforcement-Learning Model: Convergence and Applications

Authors:
Michael L. Littman;Csaba Szepesva\''ri
Affiliations:
-;-
Venue:
A Generalized Reinforcement-Learning Model: Convergence and Applications
Year:
1996

Citing 0
Cited 4

A Model of Partially Observable State Game and its Optimality

Applied Intelligence
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Neural Computation
Optimistic Bayesian sampling in contextual-bandit problems

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs, two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence. Keywords: Reinforcement learning, Q-learning convergence, Markov games