Temporal multi-hierarchy smoothing for estimating rates of rare events

  • Authors:
  • Nagaraj Kota;Deepak Agarwal

  • Affiliations:
  • Yahoo! Labs, Bangalore, India;Yahoo! Research, Santa Clara, CA, USA

  • Venue:
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of estimating rates of rare events obtained through interactions among several categorical variables that are heavy-tailed and hierarchical. In our previous work, we proposed a scalable log-linear model called LMMH (Log-Linear Models for Multiple Hierarchies) that combats data sparsity at granular levels through small sample size corrections that borrow strength from rate estimates at coarser resolutions. This paper extends our previous work in two directions. First, we model excess heterogeneity by fitting local LMMH models to relatively homogeneous subsets of the data. To ensure scalable computation, these subsets are induced through a decision tree, we call this Treed-LMMH. Second, the Treed-LMMH method is coupled with temporal smoothing procedure based on a fast Kalman filter style algorithm. We show that simultaneously performing hierarchical and temporal smoothing leads to significant improvement in predictive accuracy. Our methods are illustrated on a large scale computational advertising dataset consisting of billions of observations and hundreds of millions of attribute combinations(cells).