Design principles of massive, robust prediction systems

Authors:
Troy Raeder;Ori Stitelman;Brian Dalessandro;Claudia Perlich;Foster Provost
Affiliations:
M6D Research, New York, NY, USA;M6D Research, New York, NY, USA;M6D Research, New York, NY, USA;M6D Research, New York, NY, USA;New York University and M6D Research, New York, NY, USA
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 12
Cited 5

Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predicting good probabilities with supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
PAV and the ROC convex hull

Machine Learning
Detecting changes in large data sets of payment card data: a case study

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Breast cancer identification: KDD CUP winner's report

ACM SIGKDD Explorations Newsletter
Measuring classifier performance: a coherent alternative to the area under the ROC curve

Machine Learning
Learning from labeled and unlabeled data: an empirical study across techniques and domains

Journal of Artificial Intelligence Research
Evaluating online ad campaigns in a pipeline: causal models at scale

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Display Advertising: Targeting and Obtrusiveness

Marketing Science
Detecting adversarial advertisements in the wild

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive learning for efficiently detecting errors in insurance claims

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Bid optimizing and inventory scoring in targeted online advertising

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Using co-visitation networks for detecting large scale online display advertising exchange fraud

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable supervised dimensionality reduction using clustering

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning for targeted display advertising: transfer learning in action

Machine Learning
Explaining data-driven document classifications

MIS Quarterly

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.