Online controlled experiments at large scale

Authors:
Ron Kohavi;Alex Deng;Brian Frasca;Toby Walker;Ya Xu;Nils Pohlmann
Affiliations:
Microsoft, Redmond, WA, USA;Microsoft, Redmond, WA, USA;Microsoft, Redmond, WA, USA;Microsoft, Redmond, WA, USA;Microsoft, Redmond, WA, USA;Microsoft, Redmond, WA, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 10
Cited 1

Do it wrong quickly: how the web changes the old marketing rules

Do it wrong quickly: how the web changes the old marketing rules
Controlled experiments on the web: survey and practical guide

Data Mining and Knowledge Discovery
Seven pitfalls to avoid when running controlled experiments on the web

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Overlapping experiment infrastructure: more, better, faster experimentation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Experiments: Practical Lessons

Computer
Unexpected results in online controlled experiments

ACM SIGKDD Explorations Newsletter
Trustworthy online controlled experiments: five puzzling outcomes explained

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Online controlled experiments: introduction, learnings, and humbling statistics

Proceedings of the sixth ACM conference on Recommender systems
SCOPE: parallel databases meet MapReduce

The VLDB Journal — The International Journal on Very Large Data Bases
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

Proceedings of the sixth ACM international conference on Web search and data mining

Statistical inference in two-stage online controlled experiments with treatment selection and validation

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft's Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are billions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up-front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals more than $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders' early excitement, saving us similar large amounts.