Coarticulation: an approach for generating concurrent plans in Markov decision processes

Authors:
Khashayar Rohanimanesh;Sridhar Mahadevan
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 9
Cited 2

Bucket elimination: a unifying framework for reasoning

Artificial Intelligence
Brains, Behavior and Robotics

Brains, Behavior and Robotics
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An efficient algorithm for finding the M most probable configurationsin probabilistic expert systems

Statistics and Computing
Coordinated Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Temporal abstraction in reinforcement learning

Temporal abstraction in reinforcement learning
A hybrid architecture for adaptive robot control

A hybrid architecture for adaptive robot control
Prioritized goal decomposition of Markov decision processes: toward a synthesis of classical and decision theoretic planning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Distributed planning in hierarchical factored MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Partial Order Hierarchical Reinforcement Learning

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study an approach for performing concurrent activities in Markov decision processes (MDPs) based on the coarticulation framework. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simultaneously. We demonstrate that one natural way for generating concurrency in the system is by coarticulating among the set of learned activities available to the agent. In general due to the multiple DOF in the system, often there exists a redundant set of admissible sub-optimal policies associated with each learned activity. Such flexibility enables the agent to concurrently commit to several subgoals according to their priority levels, given a new task defined in terms of a set of prioritized subgoals. We present efficient approximate algorithms for computing such policies and for generating concurrent plans. We also evaluate our approach in a simulated domain.