Estimating rates of rare events with multiple hierarchies through scalable log-linear models

Authors:
Deepak Agarwal;Rahul Agrawal;Rajiv Khanna;Nagaraj Kota
Affiliations:
Yahoo!, Sunnyvale, CA, USA;Yahoo!, Bangalore, India;Yahoo!, Bangalore, India;Yahoo!, Bangalore, India
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 9
Cited 18

Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting clicks: estimating the click-through rate for new ads

Proceedings of the 16th international conference on World Wide Web
Hierarchical maximum entropy density estimation

Proceedings of the 24th international conference on Machine learning
Estimating rates of rare events at multiple resolutions

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Computational advertising

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Matchbox: large scale online bayesian recommendations

Proceedings of the 18th international conference on World wide web
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Regression-based latent factor models

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time

Latent OLAP: data cubes over latent variables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The sum of its parts: reducing sparsity in click estimation with query segments

Information Retrieval
Response prediction using collaborative filtering with hierarchies and side-information

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Real-time bidding algorithms for performance-based display ad allocation

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Temporal multi-hierarchy smoothing for estimating rates of rare events

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Post-click conversion modeling and analysis for non-guaranteed delivery display advertising

Proceedings of the fifth ACM international conference on Web search and data mining
Personalized click model through collaborative filtering

Proceedings of the fifth ACM international conference on Web search and data mining
Estimating conversion rate in display advertising from past erformance data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Multimedia features for click prediction of new ads in display advertising

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Position-normalized click prediction in search advertising

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Traffic quality based pricing in paid search using two-stage regression

Proceedings of the 22nd international conference on World Wide Web companion
Scalable supervised dimensionality reduction using clustering

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
CTR prediction for contextual advertising: learning-to-rank approach

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Real time bid optimization with smooth budget delivery in online advertising

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Forecasting user visits for online display advertising

Information Retrieval
Predicting response in mobile advertising with hierarchical importance-aware factorization machine

Proceedings of the 7th ACM international conference on Web search and data mining
LASER: a scalable response prediction platform for online advertising

Proceedings of the 7th ACM international conference on Web search and data mining
Machine learning for targeted display advertising: transfer learning in action

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of estimating rates of rare events for high dimensional, multivariate categorical data where several dimensions are hierarchical. Such problems are routine in several data mining applications including computational advertising, our main focus in this paper. We propose LMMH, a novel log-linear modeling method that scales to massive data applications with billions of training records and several million potential predictors in a map-reduce framework. Our method exploits correlations in aggregates observed at multiple resolutions when working with multiple hierarchies; stable estimates at coarser resolution provide informative prior information to improve estimates at finer resolutions. Other than prediction accuracy and scalability, our method has an inbuilt variable screening procedure based on a "spike and slab prior" that provides parsimony by removing non-informative predictors without hurting predictive accuracy. We perform large scale experiments on data from real computational advertising applications and illustrate our approach on datasets with several billion records and hundreds of millions of predictors. Extensive comparisons with other benchmark methods show significant improvements in prediction accuracy.