Reinforcement learning for model building and variance-penalized control

Authors:
Abhijit Gosavi
Affiliations:
Missouri University of Science and Technology, Rolla, MO
Venue:
Winter Simulation Conference
Year:
2009

Citing 12
Cited 1

Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research

IEEE Transactions on Systems, Man and Cybernetics
Variance-penalized Markov decision processes

Mathematics of Operations Research
A menu of designs for reinforcement learning over time

Neural networks for control
Stochastic approximation with two time scales

Systems & Control Letters
Model-based average reward reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Q-Learning for Risk-Sensitive Control

Mathematics of Operations Research
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
A risk-sensitive approach to total productive maintenance

Automatica (Journal of IFAC)
Risk-sensitive reinforcement learning applied to control under constraints

Journal of Artificial Intelligence Research

Stochastic policy search for variance-penalized semi-Markov control

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) is a simulation-based technique to solve Markov decision problems or processes (MDPs). It is especially useful if the transition probabilities in the MDP are hard to find or if the number of states in the problem is too large. In this paper, we present a new model-based RL algorithm that builds the transition probability model without the generation of the transition probabilities; the literature on model-based RL attempts to compute the transition probabilities. We also present a variance-penalized Bellman equation and an RL algorithm that uses it to solve a variance-penalized MDP. We conclude with some numerical experiments with these algorithms.