Temporal difference learning and TD-Gammon
Communications of the ACM
Locally Weighted Learning for Control
Artificial Intelligence Review - Special issue on lazy learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Kernel-Based Reinforcement Learning
Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Samuel meets Amarel: automating value function approximation using global state space analysis
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
ECML'05 Proceedings of the 16th European conference on Machine Learning
Transfer of samples in batch reinforcement learning
Proceedings of the 25th international conference on Machine learning
Generalized model learning for reinforcement learning in factored domains
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Improving Batch Reinforcement Learning Performance through Transfer of Samples
Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Reading between the lines: learning to map high-level instructions to commands
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Metric learning for reinforcement learning agents
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Policy iteration based on a learned transition model
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
Reinforcement learning promises a generic method for adapting agents to arbitrary tasks in arbitrary stochastic environments, but applying it to new real-world problems remains difficult, a few impressive success stories notwithstanding. Most interesting agent-environment systems have large state spaces, so performance depends crucially on efficient generalization from a small amount of experience. Current algorithms rely on model-free function approximation, which estimates the long-term values of states and actions directly from data and assumes that actions have similar values in similar states. This paper proposes model-based function approximation, which combines two forms of generalization by assuming that in addition to having similar values in similar states, actions also have similar effects. For one family of generalization schemes known as averagers, computation of an approximate value function from an approximate model is shown to be equivalent to the computation of the exact value function for a finite model derived from data. This derivation both integrates two independent sources of generalization and permits the extension of model-based techniques developed for finite problems. Preliminary experiments with a novel algorithm, AMBI (Approximate Models Based on Instances), demonstrate that this approach yields faster learning on some standard benchmark problems than many contemporary algorithms.