PAC model-free reinforcement learning

Authors:
Alexander L. Strehl;Lihong Li;Eric Wiewiora;John Langford;Michael L. Littman
Affiliations:
Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ;University of California, San Diego;TTI-Chicago, Chicago, IL;Rutgers University, Piscataway, NJ
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 9
Cited 29

Technical Note: \cal Q-Learning

Machine Learning
Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Learning Rates for Q-learning

The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning

The utility of temporal abstraction in reinforcement learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Recent Advances in Reinforcement Learning
Bayesian Reward Filtering

Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback

Autonomous Agents and Multi-Agent Systems
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Experiments with Adaptive Transfer Rate in Reinforcement Learning

Knowledge Acquisition: Approaches, Algorithms and Applications
Reinforcement using supervised learning for policy generalization

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Applying possibility and belief operators to conditional statements

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Predicting and compensating for lexicon access errors

Proceedings of the 16th international conference on Intelligent user interfaces
Kalman temporal differences

Journal of Artificial Intelligence Research
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
Comparing humans and AI agents

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Preference-based policy learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Animation of open multi-agent systems

Proceedings of the 2011 Workshop on Agent-Directed Simulation
Evaluating a reinforcement learning algorithm with a general intelligence test

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Exploration strategies for learning in multi-agent foraging

SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
PAC bounds for discounted MDPs

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Adaptive probabilistic policy reuse

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Designing of on line intrusion detection system using rough set theory and Q-learning algorithm

Neurocomputing
Generation of tests for programming challenge tasks using multi-objective optimization

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Software quality assessment using a multi-strategy classifier

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.