An Upper Bound on the Loss from Approximate Optimal-Value Functions

Authors:
Satinder P. Singh;Richard C. Yee
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA 01003. singh@cs.umass.edu;Department of Computer Science, University of Massachusetts, Amherst, MA 01003. yee@cs.umass.edu
Venue:
Machine Learning
Year:
1994

Citing 0
Cited 22

Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Solving factored MDPs using non-homogeneous partitions

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Mathematics of Operations Research
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Near-optimal character animation with continuous control

ACM SIGGRAPH 2007 papers
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Planning and Learning in Environments with Delayed Feedback

ECML '07 Proceedings of the 18th European conference on Machine Learning
Learning and planning in environments with delayed feedback

Autonomous Agents and Multi-Agent Systems
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Transfer via soft homomorphisms

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Scheduling policy design for autonomic systems

International Journal of Autonomous and Adaptive Communications Systems
Lazy approximation for solving continuous finite-horizon MDPs

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
A sparse sampling algorithm for near-optimal planning in large Markov decision processes

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Solving factored MDPs via non-homogeneous partitioning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Learning to act using real-time dynamic programming

Artificial Intelligence
Optimal stochastic policies for distributed data aggregation in wireless sensor networks

IEEE/ACM Transactions on Networking (TON)
Fuzzy decision tree function approximation in reinforcement learning

International Journal of Artificial Intelligence and Soft Computing
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

Machine Learning
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP). The value of this theoretical understanding, however, is tempered by many practical concerns. One important question is whether DP-based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance. This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when actions are selected by a greedy policy. We derive an upper bound on performance loss that is slightly tighter than that in Bertsekas (1987), and we show the extension of the bound to Q-learning (Watkins, 1989). These results provide a partial theoretical rationale for the approximation of value functions, an issue of great practical importance in reinforcement learning.