The Continuum-Armed Bandit Problem
SIAM Journal on Control and Optimization
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Relative information of type s, Csiszár's f-divergence, and information inequalities
Information Sciences—Informatics and Computer Science: An International Journal
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Introduction to Nonparametric Estimation
Introduction to Nonparametric Estimation
Relative Entropy, Exponential Utility, and Robust Dynamic Pricing
Operations Research
Improved rates for the stochastic continuum-armed bandit problem
COLT'07 Proceedings of the 20th annual conference on Learning theory
On the Minimax Complexity of Pricing in a Changing Environment
Operations Research
A Note on Performance Limitations in Bandit Problems With Side Information
IEEE Transactions on Information Theory
Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising
Operations Research
Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics
Manufacturing & Service Operations Management
Optimal Dynamic Assortment Planning with Demand Learning
Manufacturing & Service Operations Management
Hi-index | 0.00 |
We consider a stylized dynamic pricing model in which a monopolist prices a product to a sequence of T customers who independently make purchasing decisions based on the price offered according to a general parametric choice model. The parameters of the model are unknown to the seller, whose objective is to determine a pricing policy that minimizes the regret, which is the expected difference between the seller's revenue and the revenue of a clairvoyant seller who knows the values of the parameters in advance and always offers the revenue-maximizing price. We show that the regret of the optimal pricing policy in this model is $\Theta(\sqrt T)$, by establishing an $\Omega(\sqrt T)$ lower bound on the worst-case regret under an arbitrary policy, and presenting a pricing policy based on maximum-likelihood estimation whose regret is $\cal{O}(\sqrt T)$ across all problem instances. Furthermore, we show that when the demand curves satisfy a “well-separated” condition, the T-period regret of the optimal policy is Θ(log T). Numerical experiments show that our policies perform well.