Building relational world models for reinforcement learning

Authors:
Trevor Walker;Lisa Torrey;Jude Shavlik;Richard Maclin
Affiliations:
University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI;University of Wisconsin, Madison, WI;University of Minnesota, Duluth, MN
Venue:
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Year:
2007

Citing 9
Cited 1

Integrated modeling and control based on reinforcement learning and dynamic programming

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
C4.5: programs for machine learning

C4.5: programs for machine learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Scaling Reinforcement Learning toward RoboCup Soccer

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Relational Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bellman goes relational

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Learning planning rules in noisy stochastic worlds

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Top-down induction of first-order logical decision trees

Artificial Intelligence

Probabilistic relational planning with first order decision diagrams

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the highreward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.