Authorial idioms for target distributions in TTD-MDPs

Authors:
David L. Roberts;Sooraj Bhat;Kenneth St. Clair;Charles L. Isbell
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Year:
2007

Citing 4
Cited 4

Guiding interactive drama

Guiding interactive drama
Reinforcement learning for declarative optimization-based drama management

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
A globally optimal algorithm for TTD-MDPs

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Targeting specific distributions of trajectories in MDPs

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2

Another look at search-based drama management

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Another look at search-based drama management

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computational influence for training and entertainment

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
A method for transferring probabilistic user models between environments

ICIDS'11 Proceedings of the 4th international conference on Interactive Digital Storytelling

Quantified Score

Hi-index	0.00

Visualization

Abstract

In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a target distribution in Targeted Trajectory Distribution MDPs (TTD-MDPs). TTD-MDPs produce probabilistic policies that minimize divergence from a target distribution of trajectories from an underlying MDP. They are an extension of MDPs that provide variety of experience during repeated execution. Here, we present a brief overview of TTD-MDPs with approaches for constructing target distributions. Then we present a novel authorial idiom for creating target distributions using prototype trajectories. We evaluate these approaches on a drama manager for an interactive game.