Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Authors:
Benjamin Van Roy
Affiliations:
Stanford University, Stanford, California 94305
Venue:
Mathematics of Operations Research
Year:
2006

Citing 23
Cited 0

Aggregation in dynamic programming

Operations Research
The complexity of dynamic programming

Journal of Complexity
Numerical methods for stochastic control problems in continuous time

Numerical methods for stochastic control problems in continuous time
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
On the existence of fixed points for approximate value iteration and temporal-difference learning

Journal of Optimization Theory and Applications
On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal Difference Learning

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
Approximate solutions to markov decision processes

Approximate solutions to markov decision processes
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.