Controlled experiments on the web: survey and practical guide

Authors:
Ron Kohavi;Roger Longbotham;Dan Sommerfield;Randal M. Henne
Affiliations:
Microsoft, One Microsoft Way, Redmond, USA 98052;Microsoft, One Microsoft Way, Redmond, USA 98052;Microsoft, One Microsoft Way, Redmond, USA 98052;Microsoft, One Microsoft Way, Redmond, USA 98052
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 7
Cited 39

Web Metrics: Proven Methods for Measuring Web Site Success

Web Metrics: Proven Methods for Measuring Web Site Success
Discovery of Web Robot Sessions Based on their Navigational Patterns

Data Mining and Knowledge Discovery
Lessons and Challenges from Mining Retail E-Commerce Data

Machine Learning
Google Analytics

Google Analytics
Call to Action: Secret Formulas to Improve Online Results

Call to Action: Secret Formulas to Improve Online Results
Web site measurement hacks

Web site measurement hacks
Do it wrong quickly: how the web changes the old marketing rules

Do it wrong quickly: how the web changes the old marketing rules

Seven pitfalls to avoid when running controlled experiments on the web

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

The Journal of Machine Learning Research
Overlapping experiment infrastructure: more, better, faster experimentation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Causal discovery in social media using quasi-experimental designs

Proceedings of the First Workshop on Social Media Analytics
Unexpected results in online controlled experiments

ACM SIGKDD Explorations Newsletter
No clicks, no problem: using cursor movements to understand and improve search

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Better never than late: meeting deadlines in datacenter networks

Proceedings of the ACM SIGCOMM 2011 conference
Towards a living lab for information retrieval research and development: a proposal for a living lab for product search tasks

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Large-scale analysis of individual and task differences in search result page examination strategies

Proceedings of the fifth ACM international conference on Web search and data mining
Interactions with big data analytics

interactions
Collaborative Filtering Recommender Systems

Foundations and Trends in Human-Computer Interaction
Hierarchical composable optimization of web pages

Proceedings of the 21st international conference companion on World Wide Web
Combining usage and content in an online music recommendation system for music in the long-tail

Proceedings of the 21st international conference companion on World Wide Web
Finding and exploring memes in social media

Proceedings of the 23rd ACM conference on Hypertext and social media
Trustworthy online controlled experiments: five puzzling outcomes explained

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A few useful things to know about machine learning

Communications of the ACM
Improving searcher models using mouse cursor activity

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Impact of spam exposure on user engagement

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Crowdsourced user interface testing for multimedia applications

Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs

Proceedings of the 2012 ACM conference on Internet measurement conference
Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

Proceedings of the sixth ACM international conference on Web search and data mining
Content recommendation on web portals

Communications of the ACM
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Quality-biased ranking for queries with commercial intent

Proceedings of the 22nd international conference on World Wide Web companion
Online controlled experiments at large scale

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Binary recommender systems: introduction, an application and outlook

Proceedings of the International C* Conference on Computer Science and Software Engineering
Data science and prediction

Communications of the ACM
Understanding how people interact with web search results that change in real-time using implicit feedback

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Personalized models of search satisfaction

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Gamification: when it works, when it doesn't

DUXU'13 Proceedings of the Second international conference on Design, User Experience, and Usability: health, learning, playing, cultural, and cross-cultural user experience - Volume Part II
Optimization strategies for A/B testing on HADOOP

Proceedings of the VLDB Endowment
Counterfactual reasoning and learning systems: the example of computational advertising

The Journal of Machine Learning Research
Designing and deploying online field experiments

Proceedings of the 23rd international conference on World wide web
Statistical inference in two-stage online controlled experiments with treatment selection and validation

Proceedings of the 23rd international conference on World wide web
Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs

IEEE/ACM Transactions on Networking (TON)
Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols

User Modeling and User-Adapted Interaction
Composite match autocompletion COMMA: A semantic result-oriented autocompletion technique for e-marketplaces

Web Intelligence and Agent Systems
Tutorial on application-oriented evaluation of recommendation systems

AI Communications

Quantified Score

Hi-index	0.05

Visualization

Abstract

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments, A/B tests (and their generalizations), split tests, Control/Treatment tests, MultiVariable Tests (MVT) and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person's Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.