Analysis of methods for solving MDPs

Authors:
Marek Grześ;Jesse Hoey
Affiliations:
University of Waterloo, Canada;University of Waterloo, Canada
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Year:
2012

Citing 4
Cited 0

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Efficient planning in R-max

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

New proofs for two extensions to value iteration are derived when the type of initialisation of the value function is considered. Theoretical requirements that guarantee the convergence of backward value iteration and weaker requirements for the convergence of backups based on best actions only are identified. Experimental results show that standard value iteration performs significantly faster with simple extensions that are investigated in this work.