The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

  • Authors:
  • Ilya O. Ryzhov;Warren B. Powell;Peter I. Frazier

  • Affiliations:
  • Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742;Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544;Department of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853

  • Venue:
  • Operations Research
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We derive a one-period look-ahead policy for finite-and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.