Optimal learning for sequential sampling with non-parametric beliefs

Authors:
Emre Barut;Warren B. Powell
Affiliations:
Department of Operations Research and Financial Engineering, Princeton University, Princeton, USA 08544;Department of Operations Research and Financial Engineering, Princeton University, Princeton, USA 08544
Venue:
Journal of Global Optimization
Year:
2014

Citing 16
Cited 0

Learning in embedded systems

Learning in embedded systems
The Continuum-Armed Bandit Problem

SIAM Journal on Control and Optimization
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Simple Procedures for Selecting the Best Simulated System When the Number of Alternatives is Large

Operations Research
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models

Journal of Global Optimization
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
A Knowledge-Gradient Policy for Sequential Information Collection

SIAM Journal on Control and Optimization
Economic Analysis of Simulation Selection Problems

Management Science
An informational approach to the global optimization of expensive-to-evaluate functions

Journal of Global Optimization
Widely convergent method for finding multiple solutions of simultaneous nonlinear equations

IBM Journal of Research and Development
Monotone Approximation of Decision Problems

Operations Research
The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery

INFORMS Journal on Computing
Hierarchical Knowledge Gradient for Sequential Sampling

The Journal of Machine Learning Research
Sequential Procedures for Aggregating Arbitrary Estimators of a Conditional Mean

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a different bandwidth to achieve better aggregation. The final estimate uses a weighting scheme with the inverse mean square errors of the kernel estimators as weights. This weighting scheme is shown to be optimal under independent kernel estimators. For choosing the measurement, we employ the knowledge gradient policy that relies on predictive distributions to calculate the optimal sampling point. Our method allows a setting where the beliefs are expected to be correlated but the correlation structure is unknown beforehand. Moreover, the proposed policy is shown to be asymptotically optimal.