From on-line to batch learning
COLT '89 Proceedings of the second annual workshop on Computational learning theory
Learning in embedded systems
A gradient approach for smartly allocating computing budget for discrete event simulation
WSC '96 Proceedings of the 28th conference on Winter simulation
Neuro-Dynamic Programming
Efficient Global Optimization of Expensive Black-Box Functions
Journal of Global Optimization
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
On Bayesian Methods for Seeking the Extremum
Proceedings of the IFIP Technical Conference
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
New Two-Stage and Sequential Procedures for Selecting the Best Simulated System
Operations Research
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models
Journal of Global Optimization
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Online decision problems with large strategy sets
Online decision problems with large strategy sets
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Proceedings of the 25th international conference on Machine learning
A Knowledge-Gradient Policy for Sequential Information Collection
SIAM Journal on Control and Optimization
An informational approach to the global optimization of expensive-to-evaluate functions
Journal of Global Optimization
Practical bayesian optimization
Practical bayesian optimization
Widely convergent method for finding multiple solutions of simultaneous nonlinear equations
IBM Journal of Research and Development
Sequential Sampling to Myopically Maximize the Expected Value of Information
INFORMS Journal on Computing
Pure exploration in multi-armed bandits problems
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Optimal learning for sequential sampling with non-parametric beliefs
Journal of Global Optimization
Hi-index | 0.00 |
We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multi-dimensional vector of categorical and numerical attributes and has independent normal rewards. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledge-gradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. We propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement. This approach greatly reduces the measurement effort required, but it requires some prior knowledge on the smoothness of the function in the form of an aggregation function and computational issues limit the number of alternatives that can be easily considered to the thousands. We prove that our policy is consistent, finding a globally optimal alternative when given enough measurements, and show through simulations that it performs competitively with or significantly better than other policies.