Automatic ad format selection via contextual bandits

Authors:
Liang Tang;Romer Rosales;Ajit Singh;Deepak Agarwal
Affiliations:
Florida International Univ., Miami, FL, USA;LinkedIn, Mountain View, CA, USA;LinkedIn, Mountain View, CA, USA;LinkedIn, Mountain View, CA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 10
Cited 0

Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Predicting clicks: estimating the click-through rate for new ads

Proceedings of the 16th international conference on World Wide Web
Exploration scavenging

Proceedings of the 25th international conference on Machine learning
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Theoretical Computer Science
Spatio-temporal models for estimating click-through rate

Proceedings of the 18th international conference on World wide web
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Proceedings of the fourth ACM international conference on Web search and data mining
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning
Reinforcement Learning: An Introduction

IEEE Transactions on Neural Networks
Dynamic ad layout revenue optimization for display advertising

Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy
Ad click prediction: a view from the trenches

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Visual design plays an important role in online display advertising: changing the layout of an online ad can increase or decrease its effectiveness, measured in terms of click-through rate (CTR) or total revenue. The decision of which lay- out to use for an ad involves a trade-off: using a layout provides feedback about its effectiveness (exploration), but collecting that feedback requires sacrificing the immediate reward of using a layout we already know is effective (exploitation). To balance exploration with exploitation, we pose automatic layout selection as a contextual bandit problem. There are many bandit algorithms, each generating a policy which must be evaluated. It is impractical to test each policy on live traffic. However, we have found that offline replay (a.k.a. exploration scavenging) can be adapted to provide an accurate estimator for the performance of ad layout policies at Linkedin, using only historical data about the effectiveness of layouts. We describe the development of our offline replayer, and benchmark a number of common bandit algorithms.