Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Temporal difference learning and TD-Gammon
Communications of the ACM
Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
World-championship-caliber Scrabble
Artificial Intelligence - Chips challenging champions: games, computers and Artificial Intelligence
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Rollout Algorithms for Stochastic Scheduling Problems
Journal of Heuristics
An Introduction to Copulas (Springer Series in Statistics)
An Introduction to Copulas (Springer Series in Statistics)
GIB: steps toward an expert-level bridge-playing program
IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Introduction to "This is Watson"
IBM Journal of Research and Development
Special questions and techniques
IBM Journal of Research and Development
Simulation, learning, and optimization techniques in Watson's game strategies
IBM Journal of Research and Development
Hi-index | 0.00 |
Major advances in Question Answering technology were needed for IBM Watson1 to play Jeopardy! at championship level - the show requires rapid-fire answers to challenging natural language questions, broad general knowledge, high precision, and accurate confidence estimates. In addition, Jeopardy! features four types of decision making carrying great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3) selecting the next square when in control of the board; (4) deciding whether to attempt to answer, i.e., "buzz in." Using sophisticated strategies for these decisions, that properly account for the game state and future event probabilities, can significantly boost a player's overall chances to win, when compared with simple "rule of thumb" strategies. This article presents our approach to developing Watson's game-playing strategies, comprising development of a faithful simulation model, and then using learning and Monte-Carlo methods within the simulator to optimize Watson's strategic decision-making. After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulator's predictions, and documenting performance improvements using our methods. Quantitative performance benefits are shown with respect to both simple heuristic strategies, and actual human contestant performance in historical episodes. We further extend our analysis of human play to derive a number of valuable and counterintuitive examples illustrating how human contestants may improve their performance on the show.