PAC-MDP learning with knowledge-based admissible models

Authors:
Marek Grześ;Daniel Kudenko
Affiliations:
University of York, Heslington, York, UK;University of York, Heslington, York, UK
Venue:
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Year:
2010

Citing 20
Cited 0

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
An introduction to computational learning theory

An introduction to computational learning theory
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Probabilistic Planning in the Graphplan Framework

ECP '99 Proceedings of the 5th European Conference on Planning: Recent Advances in AI Planning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Automated Planning: Theory & Practice

Automated Planning: Theory & Practice
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Real-time heuristic search with a priority queue

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way which guarantees that with high probability, the algorithm performs near optimally for all but a polynomial number of steps. The performance of these algorithms can be further improved by incorporating domain knowledge to guide their learning process. In this paper we propose a framework to use partial knowledge about effects of actions in a theoretically well-founded way. Empirical evaluation shows that our proposed method is more efficient than reward shaping which represents an alternative approach to incorporate background knowledge. Our solution is also very competitive when compared with the Bayesian Exploration Bonus (BEB) algorithm. BEB is not PAC-MDP, however it can exploit domain knowledge via informative priors. We show how to use the same kind of knowledge in the PAC-MDP framework in a way which preserves all theoretical guarantees of PAC-MDP learning.