The multi-armed bandit problem: decomposition and computation
Mathematics of Operations Research
Learning in embedded systems
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Multi-armed bandit problems with dependent arms
Proceedings of the 24th international conference on Machine learning
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
A Knowledge-Gradient Policy for Sequential Information Collection
SIAM Journal on Control and Optimization
The ratio index for budgeted learning, with applications
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Economic Analysis of Simulation Selection Problems
Management Science
Sequential Sampling to Myopically Maximize the Expected Value of Information
INFORMS Journal on Computing
Information Collection on a Graph
Operations Research
Winter Simulation Conference
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
A case for a coordinated internet video control plane
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
May the best man win: simulation optimization for match-making in e-sports
Proceedings of the Winter Simulation Conference
Hi-index | 0.00 |
We derive a one-period look-ahead policy for finite-and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.