A globally optimal algorithm for TTD-MDPs

Authors:
Sooraj Bhat;David L. Roberts;Mark J. Nelson;Charles L. Isbell;Michael Mateas
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology;University of California---Santa Cruz
Venue:
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Year:
2007

Citing 14
Cited 8

Virtual reality, art, and entertainment

Presence: Teleoperators and Virtual Environments - Premier issue
Practical Issues in Temporal Difference Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Temporal difference learning and TD-Gammon

Communications of the ACM
A social reinforcement learning agent

Proceedings of the fifth international conference on Autonomous agents
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Guiding interactive drama

Guiding interactive drama
Convex Optimization

Convex Optimization
Multiagent coordination by Extended Markov Tracking

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
An argumentation based approach for practical reasoning

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Reinforcement learning for declarative optimization-based drama management

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
U-director: a decision-theoretic narrative planning architecture for storytelling environments

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Targeting specific distributions of trajectories in MDPs

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
An Oz-centric review of interactive drama and believable agents

Artificial intelligence today

Another look at search-based drama management

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Using influence and persuasion to shape player experiences

Proceedings of the 2009 ACM SIGGRAPH Symposium on Video Games
Authorial idioms for target distributions in TTD-MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Another look at search-based drama management

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computational influence for training and entertainment

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Investigating director agents' decision making in interactive narrative: a Wizard-of-Oz study

Proceedings of the Intelligent Narrative Technologies III Workshop
Director agent intervention strategies for interactive narrative environments

ICIDS'11 Proceedings of the 4th international conference on Interactive Digital Storytelling
A sequential recommendation approach for interactive personalized story generation

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)---a variant of MDPs in which the goal is to realize a specified distribution of trajectories through a state space---as a general agent-coordination framework. We present several advances to previous work on TTD-MDPs. We improve on the existing algorithm for solving TTD-MDPs by deriving a greedy algorithm that finds a policy that provably minimizes the global KL-divergence from the target distribution. We test the new algorithm by applying TTD-MDPs to drama management, where a system must coordinate the behavior of many agents to ensure that a game follows a coherent storyline, is in keeping with the author's desires, and offers a high degree of replayability. Although we show that suboptimal greedy strategies will fail in some cases, we validate previous work that suggests that they can work well in practice. We also show that our new algorithm provides guaranteed accuracy even in those cases, with little additional computational cost. Further, we illustrate how this new approach can be applied online, eliminating the memory-intensive offline sampling necessary in the previous approach.