Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Authors:
Alan Fern;Sungwook Yoon;Robert Givan
Affiliations:
School of Electrical Engineering and Computer Science, Oregon State University;School of Electrical and Computer Engineering, Purdue University;School of Electrical and Computer Engineering, Purdue University
Venue:
Journal of Artificial Intelligence Research
Year:
2006

Citing 33
Cited 25

Explanation-based learning: a problem solving perspective

Artificial Intelligence
On learning from exercises

COLT '89 Proceedings of the second annual workshop on Computational learning theory
Practical Issues in Temporal Difference Learning

Machine Learning
Taxonomic syntax for first order inference

Journal of the ACM (JACM)
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Learning to Take Actions

Machine Learning
Learning action strategies for planning domains

Artificial Intelligence
Using temporal logics to express search control knowledge for planning

Artificial Intelligence
Stochastic dynamic programming with factored representations

Artificial Intelligence
Relational reinforcement learning

Machine Learning - Special issue on inducive logic programming
Machine Learning Methods for Planning

Machine Learning Methods for Planning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts

Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Using genetic programming to learn and improve control knowledge

Artificial Intelligence
Learning Decision Lists

Machine Learning
Learning Goal-Decomposition Rules using Exercises

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning Declarative Control Rules for Constraint-BAsed Planning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Dynamic Programming

Dynamic Programming
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Learning-assisted automated planning: looking back, taking stock, going forward

AI Magazine
Bellman goes relational

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Exploiting first-order regression in inductive policy selection

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
The FF planning system: fast plan generation through heuristic search

Journal of Artificial Intelligence Research
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Ordered landmarks in planning

Journal of Artificial Intelligence Research
A selective macro-learning algorithm and its application to the N × N sliding-tile puzzle

Journal of Artificial Intelligence Research
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Symbolic dynamic programming for first-order MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Multi-strategy learning of search control for partial-order planning

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Inductive policy selection for first-order MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Learning Control Knowledge for Forward Search Planning

The Journal of Machine Learning Research
Rollout sampling approximate policy iteration

Machine Learning
Structured machine learning: the next ten years

Machine Learning
Reinforcement Learning in Nonstationary Environment Navigation Tasks

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Recent Advances in Reinforcement Learning
The factored policy-gradient planner

Artificial Intelligence
Practical solution techniques for first-order MDPs

Artificial Intelligence
Evolutionary-based learning of generalised policies for AI planning domains

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Learning generalized plans using abstract counting

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Engineering a conformant probabilistic planner

Journal of Artificial Intelligence Research
First order decision diagrams for relational MDPs

Journal of Artificial Intelligence Research
Online learning and exploiting relational models in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning Linear Ranking Functions for Beam Search with Application to Planning

The Journal of Machine Learning Research
Learning relational options for inductive transfer in relational reinforcement learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Bounds for multistage stochastic programs using supervised learning strategies

SAGA'09 Proceedings of the 5th international conference on Stochastic algorithms: foundations and applications
Finding and transferring policies using stored behaviors

Autonomous Robots
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
A new representation and associated algorithms for generalized planning

Artificial Intelligence
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
Preference-based policy iteration: leveraging preference learning for reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Probabilistic relational planning with first order decision diagrams

Journal of Artificial Intelligence Research
Automatic construction of efficient multiple battery usage policies

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Plan-based policies for efficient multiple battery load management

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.