The factored policy-gradient planner

Authors:
Olivier Buffet;Douglas Aberdeen
Affiliations:
LORIA-INRIA, Nancy University, Nancy, France;Google Inc., Zurich, Switzerland
Venue:
Artificial Intelligence
Year:
2009

Citing 29
Cited 4

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Fast planning through planning graph analysis

Artificial Intelligence
Relational reinforcement learning

Machine Learning - Special issue on inducive logic programming
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
A Multi-Agent Policy-Gradient Approach to Network Routing

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Scalable Internal-State Policy-Gradient Methods for POMDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Reinforcement Learning in POMDPs with Function Approximation

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Conjugate Directions for Stochastic Gradient Descent

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Sequential Optimality and Coordination in Multiagent Systems

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning to Cooperate via Policy Search

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Complexity of Decentralized Control of Markov Decision Processes

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Coordinated Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
MAPGEN: Mixed-Initiative Planning and Scheduling for the Mars Exploration Rover Mission

IEEE Intelligent Systems
Prottle: a probabilistic temporal planner

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
The FF planning system: fast plan generation through heuristic search

Journal of Artificial Intelligence Research
PDDL2.1: an extension to PDDL for expressing temporal planning domains

Journal of Artificial Intelligence Research
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Planning with durative actions in stochastic domains

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
When is temporal planning really temporal?

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Discriminative learning of beam-search heuristics for planning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning

Planning with Concurrency under Resources and Time Uncertainty

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
Discovering hidden structure in factored MDPs

Artificial Intelligence
Active visual sensing and collaboration on mobile robots using hierarchical POMDPs

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-ipc, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.