Lagrange dual decomposition for finite horizon Markov decision processes

Authors:
Thomas Furmston;David Barber
Affiliations:
Department of Computer Science, University College London, London, UK;Department of Computer Science, University College London, London, UK
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Year:
2011

Citing 7
Cited 0

Probabilistic inference and influence diagrams

Operations Research
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Convex Optimization

Convex Optimization
Bayesian Reasoning and Machine Learning

Bayesian Reasoning and Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.