IEEE Transactions on Systems, Man and Cybernetics
Variance-penalized Markov decision processes
Mathematics of Operations Research
A menu of designs for reinforcement learning over time
Neural networks for control
Stochastic approximation with two time scales
Systems & Control Letters
Model-based average reward reinforcement learning
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Q-Learning for Risk-Sensitive Control
Mathematics of Operations Research
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
A risk-sensitive approach to total productive maintenance
Automatica (Journal of IFAC)
Risk-sensitive reinforcement learning applied to control under constraints
Journal of Artificial Intelligence Research
Stochastic policy search for variance-penalized semi-Markov control
Proceedings of the Winter Simulation Conference
Hi-index | 0.00 |
Reinforcement learning (RL) is a simulation-based technique to solve Markov decision problems or processes (MDPs). It is especially useful if the transition probabilities in the MDP are hard to find or if the number of states in the problem is too large. In this paper, we present a new model-based RL algorithm that builds the transition probability model without the generation of the transition probabilities; the literature on model-based RL attempts to compute the transition probabilities. We also present a variance-penalized Bellman equation and an RL algorithm that uses it to solve a variance-penalized MDP. We conclude with some numerical experiments with these algorithms.