Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution

Authors:
J. Michael Harrison;N. Bora Keskin;Assaf Zeevi
Affiliations:
Graduate School of Business, Stanford University, Stanford, California 94305;Graduate School of Business, Stanford University, Stanford, California 94305;Graduate School of Business, Columbia University, New York, New York 10027
Venue:
Management Science
Year:
2012

Citing 5
Cited 3

A Partially Observed Markov Decision Process for Dynamic Pricing

Management Science
Dynamic Pricing with Online Learning and Strategic Consumers: An Application of the Aggregating Algorithm

Operations Research
Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms

Operations Research
Dynamic Pricing with a Prior on Market Response

Operations Research
Game theory and the practice of revenue management

Proceedings of the Behavioral and Quantitative Game Theory: Conference on Future Directions

Dynamic Pricing Under a General Parametric Choice Model

Operations Research
Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising

Operations Research
Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics

Manufacturing & Service Operations Management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Motivated by applications in financial services, we consider a seller who offers prices sequentially to a stream of potential customers, observing either success or failure in each sales attempt. The parameters of the underlying demand model are initially unknown, so each price decision involves a trade-off between learning and earning. Attention is restricted to the simplest kind of model uncertainty, where one of two demand models is known to apply, and we focus initially on performance of the myopic Bayesian policy (MBP), variants of which are commonly used in practice. Because learning is passive under the MBP (that is, learning only takes place as a by-product of actions that have a different purpose), it can lead to incomplete learning and poor profit performance. However, under one additional assumption, a constrained variant of the myopic policy is shown to have the following strong theoretical virtue: the expected performance gap relative to a clairvoyant who knows the underlying demand model is bounded by a constant as the number of sales attempts becomes large. This paper was accepted by Gérard P. Cachon, stochastic models and simulation.