Data Mining and Knowledge Discovery
Learning and evaluating classifiers under sample selection bias
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predicting good probabilities with supervised learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Machine Learning
Detecting changes in large data sets of payment card data: a case study
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Breast cancer identification: KDD CUP winner's report
ACM SIGKDD Explorations Newsletter
Learning from labeled and unlabeled data: an empirical study across techniques and domains
Journal of Artificial Intelligence Research
Evaluating online ad campaigns in a pipeline: causal models at scale
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Display Advertising: Targeting and Obtrusiveness
Marketing Science
Detecting adversarial advertisements in the wild
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive learning for efficiently detecting errors in insurance claims
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Bid optimizing and inventory scoring in targeted online advertising
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Using co-visitation networks for detecting large scale online display advertising exchange fraud
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable supervised dimensionality reduction using clustering
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Explaining data-driven document classifications
MIS Quarterly
Hi-index | 0.00 |
Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.