Error Bounds for Approximations from Projected Linear Equations

Authors:
Huizhen Yu;Dimitri P. Bertsekas
Affiliations:
Department of Computer Science, University of Helsinki, FIN-00014 Helsinki, Finland;Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Venue:
Mathematics of Operations Research
Year:
2010

Citing 14
Cited 1

Matrix analysis

Matrix analysis
A counterexample to temporal differences learning

Neural Computation
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Projected equation methods for approximate solution of large linear systems

Journal of Computational and Applied Mathematics
On Regression-Based Stopping Times

Discrete Event Dynamic Systems
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)

Finite-sample analysis of least-squares policy iteration

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markov decision processes (MDP), one of our bounds is always sharper than the standard contraction-based bounds, and another one is often sharper. The former bound is also tight in a worst-case sense. Our bounds also apply to the noncontraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.