Exploiting structure to efficiently solve large scale partially observable markov decision processes

Authors:
Pascal Poupart
Affiliations:
University of Toronto (Canada)
Venue:
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Year:
2005

Citing 0
Cited 44

An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Partially observable Markov decision processes with imprecise parameters

Artificial Intelligence
Model-free reinforcement learning as mixture learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Design and prototype of a device to engage cognitively disabled older adults in visual artwork

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Compact, convex upper bound iteration for approximate POMDP planning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Point-based policy iteration

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Piecewise linear dynamic programming for constrained POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Symbolic heuristic search value iteration for factored POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
A hybridized planner for stochastic domains

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Greedy algorithms for sequential sensing decisions

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process

Computer Vision and Image Understanding
2-layer erroneous-plan recognition for dementia patients in smart homes

Healthcom'09 Proceedings of the 11th international conference on e-Health networking, applications and services
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems
Towards relational POMDPs for adaptive dialogue management

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Efficient planning in large POMDPs through policy graph based factorized approximations

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Learning the behavior model of a robot

Autonomous Robots
POMDP solving: what rewards do you really expect at execution?

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Representing uncertainty about complex user goals in statistical dialogue systems

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Rapid specification and automated generation of prompting systems to assist people with dementia

Pervasive and Mobile Computing
Efficient planning under uncertainty with macro-actions

Journal of Artificial Intelligence Research
Goal-oriented sensor selection for intelligent phones: (GOSSIP)

Proceedings of the 2011 international workshop on Situation activity & goal awareness
Analyzing and escaping local optima in planning as inference for partially observable domains

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Adaptive decision support for structured organizations: a case for OrgPOMDPs

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Escaping local optima in POMDP planning as inference

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
A partition-based first-order probabilistic logic to represent interactive beliefs

SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Decision Support in Organizations: A Case for OrgPOMDPs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Reinforcement Learning of Communication in a Multi-agent Context

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
The Benefit of Decomposing POMDP for Control of Gene Regulatory Networks

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
An online POMDP algorithm used by the policeforce agents in the robocuprescue simulation

RoboCup 2005
Real-Time decision making for large POMDPs

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
A statistical spoken dialogue system using complex user goals and value directed compression

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Tractable POMDP representations for intelligent tutoring systems

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on agent communication, trust in multiagent systems, intelligent tutoring and coaching systems
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Applying POMDP to moving target optimization

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
Decentralized multi-robot cooperation with auctioned POMDPs

International Journal of Robotics Research
Planning for multiple measurement channels in a continuous-state POMDP

Annals of Mathematics and Artificial Intelligence
Adaptive management of migratory birds under sea level rise

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy

Pervasive and Mobile Computing
Interactive activity recognition and prompting to assist people with cognitive disabilities

Journal of Ambient Intelligence and Smart Environments - Home-based Health and Wellness Measurement and Monitoring

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: Large policy spaces and large state spaces. On the other hand, for many real-world POMDPs it is possible to define effective policies with simple rules of thumb. This suggests that we may be able to find small policies that are near optimal. This thesis first presents a Bounded Policy Iteration (BPI) algorithm to robustly find a good policy represented by a small finite state controller. Real-world POMDPs also tend to exhibit structural properties that can be exploited to mitigate the effect of large state spaces. To that effect, a value-directed compression (VDC) technique is also presented to reduce POMDP models to lower dimensional representations. In practice, it is critical to simultaneously mitigate the impact of complex policy representations and large state spaces. Hence, this thesis describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDC with Perseus (a randomized point-based value iteration algorithm by Spaan and Vlassis [136]), and state abstraction with Perseus. The scalability of those approaches is demonstrated on two problems with more than 33 million states: synthetic network management and a real-world system designed to assist elderly persons with cognitive deficiencies to carry out simple daily tasks such as hand-washing. This represents an important step towards the deployment of POMDP techniques in ever larger, real-world, sequential decision making problems.