Model-based exploration in continuous state spaces

Authors:
Nicholas K. Jong;Peter Stone
Affiliations:
The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX
Venue:
SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Year:
2007

Citing 8
Cited 13

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Locally Weighted Learning for Control

Artificial Intelligence Review - Special issue on lazy learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Transferring Instances for Model-Based Reinforcement Learning

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Compositional Models for Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Provably Efficient Learning with Typed Parametric Models

The Journal of Machine Learning Research
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Model-based direct policy search

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Abstraction and generalization in reinforcement learning: a summary and framework

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Transfer learning via multiple inter-task mappings

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
V-MAX: tempered optimism for better PAC reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Transferring task models in Reinforcement Learning agents

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern reinforcement learning algorithms effectively exploit experience data sampled from an unknown controlled dynamical system to compute a good control policy, but to obtain the necessary data they typically rely on naive exploration mechansisms or human domain knowledge. Approaches that first learn a model offer improved exploration in finite problems, but discrete model representations do not extend directly to continuous problems. This paper develops a method for approximating continuous models by fitting data to a finite sample of states, leading to finite representations compatible with existing model-based exploration mechanisms. Experiments with the resulting family of fitted-model reinforcement learning algorithms reveals the critical importance of how the continuous model is generalized from finite data. This paper demonstrates instantiations of fitted-model algorithms that lead to faster learning on benchmark problems than contemporary model-free RL algorithms that only apply generalization in estimating action values. Finally, the paper concludes that in continuous problems, the exploration-exploitation tradeoff is better construed as a balance between exploration and generalization.