Reinforcement Learning: A Tutorial Survey and Recent Advances

Authors:
Abhijit Gosavi
Affiliations:
Department of Engineering Management and Systems Engineering, Missouri University of Science and Technology, Rolla, Missouri 65409
Venue:
INFORMS Journal on Computing
Year:
2009

Citing 68
Cited 4

Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research

IEEE Transactions on Systems, Man and Cybernetics
Learning optimal discriminant functions through a cooperative game of automata

IEEE Transactions on Systems, Man and Cybernetics
Learning automata: an introduction

Learning automata: an introduction
Convergent activation dynamics in continuous time networks

Neural Networks
A model for reasoning about persistence and causation

Computational Intelligence
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Practical Issues in Temporal Difference Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Discrete-time controlled Markov processes with average cost criterion: a survey

SIAM Journal on Control and Optimization
Efficient learning and planning within the Dyna framework

Adaptive Behavior
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Reinforcement learning algorithms for average-payoff Markovian decision processes

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales

Systems & Control Letters
Asynchronous Stochastic Approximations

SIAM Journal on Control and Optimization
Model-based average reward reinforcement learning

Artificial Intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Actor-Critic--Type Learning Algorithms for Markov Decision Processes

SIAM Journal on Control and Optimization
Bounded-parameter Markov decision process

Artificial Intelligence
Hierarchical multi-agent reinforcement learning

Proceedings of the fifth international conference on Autonomous agents
Multiagent learning using a variable learning rate

Artificial Intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Learning Automata and Stochastic Optimization

Learning Automata and Stochastic Optimization
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning Algorithms for Markov Decision Processes with Average Cost

SIAM Journal on Control and Optimization
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms

SIAM Journal on Control and Optimization
Kernel-Based Reinforcement Learning

Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Introduction: The Challenge of Reinforcement Learning

Machine Learning
Induction of Decision Trees

Machine Learning
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Reinforcement Learning in POMDPs with Function Approximation

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Management Science
Q-Learning for Risk-Sensitive Control

Mathematics of Operations Research
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
A Tabu-Search Hyperheuristic for Timetabling and Rostering

Journal of Heuristics
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
Choosing search heuristics by non-stationary reinforcement learning

Metaheuristics
Learning Rates for Q-learning

The Journal of Machine Learning Research
A Hybrid Genetic/Optimization Algorithm for Finite-Horizon, Partially Observed Markov Decision Processes

INFORMS Journal on Computing
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Risk-sensitive reinforcement learning applied to control under constraints

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Reinforcement learning as a means of dynamic aggregate QoS provisioning

Art-QoS'03 Proceedings of the 2003 international conference on Architectures for quality of service in the internet
A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

Automatica (Journal of IFAC)
Actor-critic algorithms for hierarchical Markov decision processes

Automatica (Journal of IFAC)

Delay-optimal user scheduling and inter-cell interference management in cellular network via distributive stochastic learning

IEEE Transactions on Wireless Communications
Approximate stochastic annealing for online control of infinite horizon Markov decision processes

Automatica (Journal of IFAC)
Policy sharing between multiple mobile robots using decision trees

Information Sciences: an International Journal
Generation of tests for programming challenge tasks using multi-objective optimization

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last few years, reinforcement learning (RL), also called adaptive (or approximate) dynamic programming, has emerged as a powerful tool for solving complex sequential decision-making problems in control theory. Although seminal research in this area was performed in the artificial intelligence (AI) community, more recently it has attracted the attention of optimization theorists because of several noteworthy success stories from operations management. It is on large-scale and complex problems of dynamic optimization, in particular the Markov decision problem (MDP) and its variants, that the power of RL becomes more obvious. It has been known for many years that on large-scale MDPs, the curse of dimensionality and the curse of modeling render classical dynamic programming (DP) ineffective. The excitement in RL stems from its direct attack on these curses, which allows it to solve problems that were considered intractable via classical DP in the past. The success of RL is due to its strong mathematical roots in the principles of DP, Monte Carlo simulation, function approximation, and AI. Topics treated in some detail in this survey are temporal differences, Q-learning, semi-MDPs, and stochastic games. Several recent advances in RL, e.g., policy gradients and hierarchical RL, are covered along with references. Pointers to numerous examples of applications are provided. This overview is aimed at uncovering the mathematical roots of this science so that readers gain a clear understanding of the core concepts and are able to use them in their own research. The survey points to more than 100 references from the literature.