Do it wrong quickly: how the web changes the old marketing rules
Do it wrong quickly: how the web changes the old marketing rules
Controlled experiments on the web: survey and practical guide
Data Mining and Knowledge Discovery
Seven pitfalls to avoid when running controlled experiments on the web
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Overlapping experiment infrastructure: more, better, faster experimentation
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Unexpected results in online controlled experiments
ACM SIGKDD Explorations Newsletter
Trustworthy online controlled experiments: five puzzling outcomes explained
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Online controlled experiments: introduction, learnings, and humbling statistics
Proceedings of the sixth ACM conference on Recommender systems
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data
Proceedings of the sixth ACM international conference on Web search and data mining
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft's Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are billions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up-front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals more than $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders' early excitement, saving us similar large amounts.