The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
The price of truthfulness for pay-per-click auctions
Proceedings of the 10th ACM conference on Electronic commerce
Explore/Exploit Schemes for Web Content Optimization
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Maintaining equilibria during exploration in sponsored search auctions
WINE'07 Proceedings of the 3rd international conference on Internet and network economics
Value of learning in sponsored search auctions
WINE'10 Proceedings of the 6th international conference on Internet and network economics
Efficient ranking in sponsored search
WINE'11 Proceedings of the 7th international conference on Internet and Network Economics
Hi-index | 0.00 |
We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/exploit problem as a dynamic programming problem and show that efficiency is maximized by making a bid for each advertiser equal to the advertiser's expected value for the advertising opportunity plus a term proportional to the variance in this value divided by the number of impressions the advertiser has received thus far. We then use this result to illustrate that the value of incorporating active exploration into a machine learning system in an auction environment is exceedingly small.