Temporal multi-hierarchy smoothing for estimating rates of rare events

Authors:
Nagaraj Kota;Deepak Agarwal
Affiliations:
Yahoo! Labs, Bangalore, India;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 17
Cited 0

Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian Treed Models

Machine Learning
SECRET: a scalable linear regression tree algorithm

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hierarchical maximum entropy density estimation

Proceedings of the 24th international conference on Machine learning
Estimating rates of rare events at multiple resolutions

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable look-ahead linear regression trees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Computational advertising

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Evidence Contrary to the Statistical View of Boosting

The Journal of Machine Learning Research
Computational advertising and recommender systems

Proceedings of the 2008 ACM conference on Recommender systems
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale behavioral targeting

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
Estimating rates of rare events with multiple hierarchies through scalable log-linear models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of estimating rates of rare events obtained through interactions among several categorical variables that are heavy-tailed and hierarchical. In our previous work, we proposed a scalable log-linear model called LMMH (Log-Linear Models for Multiple Hierarchies) that combats data sparsity at granular levels through small sample size corrections that borrow strength from rate estimates at coarser resolutions. This paper extends our previous work in two directions. First, we model excess heterogeneity by fitting local LMMH models to relatively homogeneous subsets of the data. To ensure scalable computation, these subsets are induced through a decision tree, we call this Treed-LMMH. Second, the Treed-LMMH method is coupled with temporal smoothing procedure based on a fast Kalman filter style algorithm. We show that simultaneously performing hierarchical and temporal smoothing leads to significant improvement in predictive accuracy. Our methods are illustrated on a large scale computational advertising dataset consisting of billions of observations and hundreds of millions of attribute combinations(cells).