Non-parametric policy gradients: a unified treatment of propositional and relational domains

Authors:
Kristian Kersting;Kurt Driessens
Affiliations:
Fraunhofer IAIS, Sankt Augustin, Germany;Katholieke Universiteit Leuven, Heverlee, Belgium
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 22
Cited 7

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Top-down induction of first-order logical decision trees

Artificial Intelligence
Tree based discretization for continuous state space reinforcement learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Blocks World revisited

Artificial Intelligence
Relational reinforcement learning

Machine Learning - Special issue on inducive logic programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Coordinated Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Bellman goes relational

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Training conditional random fields via gradient tree boosting

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Integrating Guidance into Relational Reinforcement Learning

Machine Learning
Learning decisions: robustness, uncertainty, and approximation

Learning decisions: robustness, uncertainty, and approximation
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Combining model-based and instance-based learning for first order regression

ICML '05 Proceedings of the 22nd international conference on Machine learning
Policy Gradient in Continuous Time

The Journal of Machine Learning Research
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
First order decision diagrams for relational MDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
TildeCRF: conditional random fields for logical sequences

ECML'06 Proceedings of the 17th European conference on Machine Learning
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Seeing the forest despite the trees: large scale spatial-temporal decision making

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Exploration in relational worlds

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
Preference-based policy iteration: leveraging preference learning for reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Imitation learning in relational domains: a functional-gradient boosting approach

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Exploration in relational domains for model-based reinforcement learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult - if not impossible - to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach - called NPPG - that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.