Constraint relaxation in approximate linear programs

Authors:
Marek Petrik;Shlomo Zilberstein
Affiliations:
University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 7
Cited 3

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning Options in Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Least-squares policy iteration

The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research

Robust Approximate Bilinear Programming for Value Function Approximation

The Journal of Machine Learning Research
Approximate Dynamic Programming via a Smoothed Linear Program

Operations Research
Approximate Linear Programming for Average Cost MDPs

Mathematics of Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for the poor quality of ALP solutions in problems where the approximation induces virtual loops. We then introduce two methods for improving solution quality. One method rolls out selected constraints of the ALP, guided by the dual information. The second method is a relaxation of the ALP, based on external penalty methods. The latter method is applicable in domains in which rolling out constraints is impractical. Both approaches show promising empirical results for simple benchmark problems as well as for a realistic blood inventory management problem.