Estimating rates of rare events at multiple resolutions

Authors:
Deepak Agarwal;Andrei Zary Broder;Deepayan Chakrabarti;Dejan Diklic;Vanja Josifovski;Mayssam Sayyadian
Affiliations:
Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 3
Cited 20

Time Series Analysis, Forecasting and Control

Time Series Analysis, Forecasting and Control
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Improving rating estimation in recommender systems using aggregation- and variance-based hierarchical models

Proceedings of the third ACM conference on Recommender systems
Translating relevance scores to probabilities for contextual advertising

Proceedings of the 18th ACM conference on Information and knowledge management
Competing for users' attention: on the interplay between organic and sponsored search results

Proceedings of the 19th international conference on World wide web
Network quantification despite biased labels

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Overlapping experiment infrastructure: more, better, faster experimentation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Metric forensics: a multi-level approach for mining volatile graphs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating rates of rare events with multiple hierarchies through scalable log-linear models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Using external aggregate ratings for improving individual recommendations

ACM Transactions on the Web (TWEB)
Latent OLAP: data cubes over latent variables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The sum of its parts: reducing sparsity in click estimation with query segments

Information Retrieval
Improving local search ranking through external logs

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Response prediction using collaborative filtering with hierarchies and side-information

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Bid landscape forecasting in online ad exchange marketplace

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Diversified ranking on large graphs: an optimization viewpoint

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Temporal multi-hierarchy smoothing for estimating rates of rare events

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating conversion rate in display advertising from past erformance data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
The Planning of Guaranteed Targeted Display Advertising

Operations Research
Real time bid optimization with smooth budget delivery in online advertising

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Forecasting user visits for online display advertising

Information Retrieval
Exploiting contextual factors for click modeling in sponsored search

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.02

Visualization

Abstract

We consider the problem of estimating occurrence rates of rare eventsfor extremely sparse data, using pre-existing hierarchies to perform inference at multiple resolutions. In particular, we focus on the problem of estimating click rates for (webpage, advertisement) pairs (called impressions) where both the pages and the ads are classified into hierarchies that capture broad contextual information at different levels of granularity. Typically the click rates are low and the coverage of the hierarchies is sparse. To overcome these difficulties we devise a sampling method whereby we analyze aspecially chosen sample of pages in the training set, and then estimate click rates using a two-stage model. The first stage imputes the number of (webpage, ad) pairs at all resolutions of the hierarchy to adjust for the sampling bias. The second stage estimates clickrates at all resolutions after incorporating correlations among sibling nodes through a tree-structured Markov model. Both models are scalable and suited to large scale data mining applications. On a real-world dataset consisting of 1/2 billion impressions, we demonstrate that even with 95% negative (non-clicked) events in the training set, our method can effectively discriminate extremely rare events in terms of their click propensity.