Technical Note: \cal Q-Learning
Machine Learning
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An empirical evaluation of several methods to select the best system
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Simulation Modeling and Analysis
Simulation Modeling and Analysis
New Two-Stage and Sequential Procedures for Selecting the Best Simulated System
Operations Research
Ranking and selection in simulation
WSC '83 Proceedings of the 15th conference on Winter Simulation - Volume 2
Finding the shortest path in stochastic networks
Computers & Mathematics with Applications
Recent advances in ranking and selection
Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come
A Knowledge-Gradient Policy for Sequential Information Collection
SIAM Journal on Control and Optimization
Selecting a Selection Procedure
Management Science
Economic Analysis of Simulation Selection Problems
Management Science
Shortest paths in stochastic networks with correlated link costs
Computers & Mathematics with Applications
A First Course in Bayesian Statistical Methods
A First Course in Bayesian Statistical Methods
Sequential Sampling to Myopically Maximize the Expected Value of Information
INFORMS Journal on Computing
The conjunction of the knowledge gradient and the economic approach to simulation selection
Winter Simulation Conference
Consistency of Sequential Bayesian Sampling Policies
SIAM Journal on Control and Optimization
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Operations Research
Optimal learning of transition probabilities in the two-agent newsvendor problem
Proceedings of the Winter Simulation Conference
Hi-index | 0.00 |
We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection.