On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning

Authors:
Abhijit Gosavi
Affiliations:
Missouri University of Science and Technology, Rolla, MO
Venue:
Proceedings of the 40th Conference on Winter Simulation
Year:
2008

Citing 12
Cited 1

Variance-penalized Markov decision processes

Mathematics of Operations Research
Asynchronous Stochastic Approximations

SIAM Journal on Control and Optimization
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Reinforcement Learning

Reinforcement Learning
Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Q-Learning for Risk-Sensitive Control

Mathematics of Operations Research
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
Learning Rates for Q-learning

The Journal of Machine Learning Research
Introduction to Probability Models, Ninth Edition

Introduction to Probability Models, Ninth Edition
A risk-sensitive approach to total productive maintenance

Automatica (Journal of IFAC)
Risk-sensitive reinforcement learning applied to control under constraints

Journal of Artificial Intelligence Research

Stochastic policy search for variance-penalized semi-Markov control

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning (RL) is a simulation-based technique useful in solving Markov decision processes if their transition probabilities are not easily obtainable or if the problems have a very large number of states. We present an empirical study of (i) the effect of step-sizes (learning rules) in the convergence of RL algorithms, (ii) stochastic shortest paths in solving average reward problems via RL, and (iii) the notion of survival probabilities (downside risk) in RL. We also study the impact of step sizes when function approximation is combined with RL. Our experiments yield some interesting insights that will be useful in practice when RL algorithms are implemented within simulators.