Improving the sensitivity of online controlled experiments by utilizing pre-experiment data

Authors:
Alex Deng;Ya Xu;Ron Kohavi;Toby Walker
Affiliations:
Microsoft, Redmond, WA, USA;Microsoft, Sunnyvale, CA, USA;Microsoft, Redmond, WA, USA;Microsoft, Redmond, WA, USA
Venue:
Proceedings of the sixth ACM international conference on Web search and data mining
Year:
2013

Citing 5
Cited 2

Controlled experiments on the web: survey and practical guide

Data Mining and Knowledge Discovery
Overlapping experiment infrastructure: more, better, faster experimentation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
All of Statistics: A Concise Course in Statistical Inference

All of Statistics: A Concise Course in Statistical Inference
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Trustworthy online controlled experiments: five puzzling outcomes explained

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Online controlled experiments at large scale

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical inference in two-stage online controlled experiments with treatment selection and validation

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online controlled experiments are at the heart of making data-driven decisions at a diverse set of companies, including Amazon, eBay, Facebook, Google, Microsoft, Yahoo, and Zynga. Small differences in key metrics, on the order of fractions of a percent, may have very significant business implications. At Bing it is not uncommon to see experiments that impact annual revenue by millions of dollars, even tens of millions of dollars, either positively or negatively. With thousands of experiments being run annually, improving the sensitivity of experiments allows for more precise assessment of value, or equivalently running the experiments on smaller populations (supporting more experiments) or for shorter durations (improving the feedback cycle and agility). We propose an approach (CUPED) that utilizes data from the pre-experiment period to reduce metric variability and hence achieve better sensitivity. This technique is applicable to a wide variety of key business metrics, and it is practical and easy to implement. The results on Bing's experimentation system are very successful: we can reduce variance by about 50%, effectively achieving the same statistical power with only half of the users, or half the duration.