Reinforcement learning with partially known world dynamics

Authors:
Christian R. Shelton
Affiliations:
Computer Science Department, Stanford University
Venue:
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Year:
2002

Citing 16
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Creating advice-taking reinforcement learners

Machine Learning - Special issue on reinforcement learning
Simulation and the Monte Carlo Method

Simulation and the Monte Carlo Method
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning from Scarce Experience

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Lyapunov-Constrained Action Sets for Reinforcement Learning

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Improvement for POMDPs Using Normalized Importance Sampling

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Bounds on Sample Size for Policy Evaluation in Markov Environments

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes
Temporal abstraction in reinforcement learning

Temporal abstraction in reinforcement learning
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning would enjoy better success on real-world problems if domain knowledge could be imparted to the algorithm by the modelers. Most problems have both hidden state and unknown dynamics. Partially observable Markov decision processes (POMDPs) allow for the modeling of both. Unfortunately, they do not provide a natural framework in which to specify knowledge about the domain dynamics. The designer must either admit to knowing nothing about the dynamics or completely specify the dynamics (thereby turning it into a planning problem). We propose a new framework called a partially known Markov decision process (PKMDP) which allows the designer to specify known dynamics while still leaving portions of the environment's dynamics unknown. The model represents not only the environment dynamics but also the agent's knowledge of the dynamics. We present a reinforcement learning algorithm for this model based on importance sampling. The algorithm incorporates planning based on the known dynamics and learning about the unknown dynamics. Our results clearly demonstrate the ability to add domain knowledge and the resulting benefits for learning.