Thresholded rewards: acting optimally in timed, zero-sum games

Authors:
Colin McMillen;Manuela Veloso
Affiliations:
Computer Science Department, Carnegie Mellon University, Pittsburgh, PA;Computer Science Department, Carnegie Mellon University, Pittsburgh, PA
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Year:
2007

Citing 6
Cited 5

Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Layered learning in multiagent systems

Layered learning in multiagent systems
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Strategic betting for competitive agents

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
An exact algorithm for solving MDPs under risk-sensitive planning objectives with one-switch utility functions

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Unknown rewards in finite-horizon domains

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Planning for human-robot interaction using time-state aggregated POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
An effective personal mobile robot agent through symbiotic human-robot interaction

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent's score; the "true" reward of a win, loss, or tie is determined at the end of a game by applying a threshold function to the cumulative intermediate reward. We introduce thresholded-rewards problems to capture this dependency of the final reward outcome on the cumulative intermediate reward. Thresholded-rewards problems reflect different real-world stochastic planning domains, especially zero-sum games, in which time and score need to be considered. We investigate the application of thresholded rewards to finite-horizon Markov Decision Processes (MDPs). In general, the optimal policy for a thresholded-rewards MDP will be non-stationary, depending on the number of time steps remaining and the cumulative intermediate reward. We introduce an efficient value iteration algorithm that solves thresholded-rewards MDPs exactly, but with running time quadratic on the number of states in the MDP and the length of the time horizon. We investigate a number of heuristic-based techniques that efficiently find approximate solutions for MDPs with large state spaces or long time horizons.