Solving factored MDPs using non-homogeneous partitions
Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Performance Loss Bounds for Approximate Value Iteration with State Aggregation
Mathematics of Operations Research
The Journal of Machine Learning Research
Automatic shaping and decomposition of reward functions
Proceedings of the 24th international conference on Machine learning
Near-optimal character animation with continuous control
ACM SIGGRAPH 2007 papers
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Artificial Intelligence
Planning and Learning in Environments with Delayed Feedback
ECML '07 Proceedings of the 18th European conference on Machine Learning
Learning and planning in environments with delayed feedback
Autonomous Agents and Multi-Agent Systems
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
Transfer via soft homomorphisms
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Scheduling policy design for autonomic systems
International Journal of Autonomous and Adaptive Communications Systems
Lazy approximation for solving continuous finite-horizon MDPs
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Solving factored MDPs via non-homogeneous partitioning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Learning to act using real-time dynamic programming
Artificial Intelligence
Optimal stochastic policies for distributed data aggregation in wireless sensor networks
IEEE/ACM Transactions on Networking (TON)
Fuzzy decision tree function approximation in reinforcement learning
International Journal of Artificial Intelligence and Soft Computing
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Hi-index | 0.00 |
Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP). The value of this theoretical understanding, however, is tempered by many practical concerns. One important question is whether DP-based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance. This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when actions are selected by a greedy policy. We derive an upper bound on performance loss that is slightly tighter than that in Bertsekas (1987), and we show the extension of the bound to Q-learning (Watkins, 1989). These results provide a partial theoretical rationale for the approximation of value functions, an issue of great practical importance in reinforcement learning.