A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies

Authors:
Ilya O. Ryzhov;Warren Powell
Affiliations:
Princeton University, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
Winter Simulation Conference
Year:
2009

Citing 8
Cited 2

The multi-armed bandit problem: decomposition and computation

Mathematics of Operations Research
Learning in embedded systems

Learning in embedded systems
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Multi-armed bandit problems with dependent arms

Proceedings of the 24th international conference on Machine learning
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
A Knowledge-Gradient Policy for Sequential Information Collection

SIAM Journal on Control and Optimization
Sequential Sampling to Myopically Maximize the Expected Value of Information

INFORMS Journal on Computing
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Operations Research
May the best man win: simulation optimization for match-making in e-sports

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.01

Visualization

Abstract

Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expensive field implementations. We view this problem as an online optimal learning problem with correlated prior beliefs, where the performance of a portfolio of technologies in one project is used to guide choices for future projects. Given the large number of potential portfolios, we propose a learning policy which uses Monte Carlo sampling to narrow down the choice set to a relatively small number of promising portfolios, and then applies a one-period look-ahead approach using knowledge gradients to choose among this reduced set. We present experimental evidence that this policy is competitive against other online learning policies that consider the entire choice set.