The multi-armed bandit problem: decomposition and computation
Mathematics of Operations Research
Learning in embedded systems
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Multi-armed bandit problems with dependent arms
Proceedings of the 24th international conference on Machine learning
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
A Knowledge-Gradient Policy for Sequential Information Collection
SIAM Journal on Control and Optimization
Sequential Sampling to Myopically Maximize the Expected Value of Information
INFORMS Journal on Computing
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Operations Research
May the best man win: simulation optimization for match-making in e-sports
Proceedings of the Winter Simulation Conference
Hi-index | 0.01 |
Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expensive field implementations. We view this problem as an online optimal learning problem with correlated prior beliefs, where the performance of a portfolio of technologies in one project is used to guide choices for future projects. Given the large number of potential portfolios, we propose a learning policy which uses Monte Carlo sampling to narrow down the choice set to a relatively small number of promising portfolios, and then applies a one-period look-ahead approach using knowledge gradients to choose among this reduced set. We present experimental evidence that this policy is competitive against other online learning policies that consider the entire choice set.