Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Solving large scale linear prediction problems using stochastic gradient descent algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Large scale genomic sequence SVM classifiers
ICML '05 Proceedings of the 22nd international conference on Machine learning
The DLT priority sampling is essentially optimal
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Contextual advertising by combining relevance with click feedback
Proceedings of the 17th international conference on World Wide Web
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Predicting bounce rates in sponsored search advertisements
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse Online Learning via Truncated Gradient
The Journal of Machine Learning Research
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Design principles of massive, robust prediction systems
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Cost-sensitive learning for large-scale hierarchical classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On the hardness of evading combinations of linear classifiers
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Approaches to adversarial drift
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Hi-index | 0.00 |
In a large online advertising system, adversaries may attempt to profit from the creation of low quality or harmful advertisements. In this paper, we present a large scale data mining effort that detects and blocks such adversarial advertisements for the benefit and safety of our users. Because both false positives and false negatives have high cost, our deployed system uses a tiered strategy combining automated and semi-automated methods to ensure reliable classification. We also employ strategies to address the challenges of learning from highly skewed data at scale, allocating the effort of human experts, leveraging domain expert knowledge, and independently assessing the effectiveness of our system.