Optimal learning of transition probabilities in the two-agent newsvendor problem

Authors:
Ilya O. Ryzhov;Martin R. Valdez-Vivas;Warren B. Powell
Affiliations:
Princeton University, Princeton, NJ;Stanford University, Stanford, CA;Princeton University, Princeton, NJ
Venue:
Proceedings of the Winter Simulation Conference
Year:
2010

Citing 14
Cited 1

Optimizing inventory levels in a two-echelon retailer system with partial lost sales

Management Science
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Pricing and the News Vendor Problem: a Review with Extensions

Operations Research
Stalking Information: Bayesian Inventory Management with Unobserved Lost Sales

Management Science
The Censored Newsvendor and the Optimal Acquisition of Information

Operations Research
Q-Learning for Bandit Problems

Q-Learning for Bandit Problems
Bias and Variance Approximation in Value Function Estimates

Management Science
A Knowledge-Gradient Policy for Sequential Information Collection

SIAM Journal on Control and Optimization
A Multiperiod Newsvendor Problem with Partially Observed Demand

Mathematics of Operations Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Sequential Sampling to Myopically Maximize the Expected Value of Information

INFORMS Journal on Computing
Information Collection on a Graph

Operations Research
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

May the best man win: simulation optimization for match-making in e-sports

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine a newsvendor problem with two agents: a requesting agent that observes private demand information, and an oversight agent that must determine how to allocate resources upon receiving a bid from the requesting agent. Because the two agents have different cost structures, the requesting agent tends to bid higher than the amount that is actually needed. As a result, the allocating agent needs to adaptively learn how to interpret the bids and estimate the requesting agent's biases. Learning must occur as quickly as possible, because each suboptimal resource allocation incurs an economic cost. We present a mathematical model that casts the problem as a Markov decision process with unknown transition probabilities. We then perform a simulation study comparing four different techniques for optimal learning of transition probabilities. The best technique is shown to be a knowledge gradient algorithm, based on a one-period look-ahead approach.