Unknown rewards in finite-horizon domains

Authors:
Colin McMillen;Manuela Veloso
Affiliations:
Computer Science Department, Carnegie Mellon University, Pittsburgh, PA;Computer Science Department, Carnegie Mellon University, Pittsburgh, PA
Venue:
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Year:
2008

Citing 4
Cited 0

Functional value iteration for decision-theoretic planning with general utility functions

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Thresholded rewards: acting optimally in timed, zero-sum games

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Human computation" is a recent approach that extracts information from large numbers of Web users. reCAPTCHA is a human computation project that improves the process of digitizing books by getting humans to read words that are difficult for OCR algorithms to read (von Ahn et al., 2008). In this paper, we address an interesting strategic control problem inspired by the reCAPTCHA project: given a large set of words to transcribe within a time deadline, how can we choose the difficulty level such that we maximize the probability of successfully transcribing a document on time? Our approach is inspired by previous work on timed, zero-sum games, as we face an analogous timed policy decision on the choice of words to present to users. However, our Web-based word transcribing domain is particularly challenging as the reward of the actions is not known; i.e., there is no knowledge if the spelling provided by a human is actually correct. We contribute an approach to solve this problem by checking a small fraction of the answers at execution time, obtaining an estimate of the cumulative reward. We present experimental results showing how the number of samples and time between samples affects the probability of success. We also investigate the choice of aggressive or conservative actions with regard to the bounds produced by sampling. We successfully apply our algorithm to real data gathered by the reCAPTCHA project.