Simulation, learning, and optimization techniques in Watson's game strategies

Authors:
G. Tesauro;D. C. Gondek;J. Lenchner;J. Fan;J. M. Prager
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
IBM Journal of Research and Development
Year:
2012

Citing 7
Cited 3

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Temporal difference learning and TD-Gammon

Communications of the ACM
The challenge of poker

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
World-championship-caliber Scrabble

Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
GIB: Steps Toward an Expert-Level Bridge-Playing Program

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence

Introduction to "This is Watson"

IBM Journal of Research and Development
In the game: the interface between Watson and Jeopardy!

IBM Journal of Research and Development
Analysis of watson's strategies for playing Jeopardy!

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The game of Jeopardy!™ features four types of strategic decision-making: 1) Daily Double wagering; 2) Final Jeopardy! wagering; 3) selecting the next square when in control of the board; and 4) deciding whether to attempt to answer, i.e., "buzz in." Strategies that properly account for the game state and future event probabilities can yield a huge boost in overall winning chances, when compared with simple "rule-of-thumb" strategies. In this paper, we present an approach to developing and testing components to make said strategy decisions, founded upon development of reasonably faithful simulation models of the players and the Jeopardy! game environment. We describe machine learning and Monte Carlo methods used in simulations to optimize the respective strategy algorithms. Application of these methods yielded superhuman game strategies for IBM Watsoni that significantly enhanced its overall competitive record.