The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Authors:
Ilya O. Ryzhov;Warren B. Powell;Peter I. Frazier
Affiliations:
Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742;Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544;Department of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Venue:
Operations Research
Year:
2012

Citing 14
Cited 2

The multi-armed bandit problem: decomposition and computation

Mathematics of Operations Research
Learning in embedded systems

Learning in embedded systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic

Operations Research
Multi-armed bandit problems with dependent arms

Proceedings of the 24th international conference on Machine learning
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
A Knowledge-Gradient Policy for Sequential Information Collection

SIAM Journal on Control and Optimization
The ratio index for budgeted learning, with applications

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Economic Analysis of Simulation Selection Problems

Management Science
Sequential Sampling to Myopically Maximize the Expected Value of Information

INFORMS Journal on Computing
Information Collection on a Graph

Operations Research
A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies

Winter Simulation Conference
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning

A case for a coordinated internet video control plane

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
May the best man win: simulation optimization for match-making in e-sports

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We derive a one-period look-ahead policy for finite-and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.