Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Playing large games using simple strategies
Proceedings of the 4th ACM conference on Electronic commerce
Optimal Dynamic Auctions for Revenue Management
Management Science
Adaptive limited-supply online auctions
EC '04 Proceedings of the 5th ACM conference on Electronic commerce
Online auctions with re-usable goods
Proceedings of the 6th ACM conference on Electronic commerce
Truthful auctions for pricing search keywords
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Prediction, Learning, and Games
Prediction, Learning, and Games
Dynamic Mechanism Design for Online Commerce
Operations Research
Approximating nash equilibria using small-support strategies
Proceedings of the 8th ACM conference on Electronic commerce
Dynamic cost-per-action mechanisms and applications to online advertising
Proceedings of the 17th international conference on World Wide Web
A note on approximate Nash equilibria
Theoretical Computer Science
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
The price of truthfulness for pay-per-click auctions
Proceedings of the 10th ACM conference on Electronic commerce
Pay-per-action model for online advertising
WINE'07 Proceedings of the 3rd international conference on Internet and network economics
Truthful mechanisms with implicit payment computation
Proceedings of the 11th ACM conference on Electronic commerce
Click fraud resistant methods for learning click-through rates
WINE'05 Proceedings of the First international conference on Internet and Network Economics
Dynamic Pricing Under a General Parametric Choice Model
Operations Research
Blind Network Revenue Management
Operations Research
Hi-index | 0.00 |
We examine the problem of allocating an item repeatedly over time amongst a set of agents. The value that each agent derives from consumption of the item may vary over time. Furthermore, it is private information to the agent, and prior to consumption it may be unknown to that agent. We describe a mechanism based on a sampling-based learning algorithm that under suitable assumptions is asymptotically individually rational, asymptotically Bayesian incentive compatible, and asymptotically ex ante efficient. Our mechanism can be interpreted as a pay-per-action or pay-per-acquisition PPA charging scheme in online advertising. In this scheme, instead of paying per click, advertisers pay only when a user takes a specific action e.g., purchases an item or fills out a form on their websites.