Variable selection for ad prediction

  • Authors:
  • Suma Bhat;Kenneth Church

  • Affiliations:
  • UIUC, Urbana-Champaign, IL;Microsoft Research, Redmond, WA

  • Venue:
  • Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of predicting the probability of a click for an advertisement when the outcome of a click or no-click is expressed by means of a set of a large number of variables. Many, if not most, of these variables are very weakly related to the clicking of the ad. Thus, a traditional approach to address this problem that treats each variable on an equal and blind footing takes away the interpretability in explaining the underlying process of the outcome. Such an approach would be computationally expensive and, further, may suffer from poor generalization. We investigate the forward selection method for variable subset selection in the domain of advertisement click-through-rate prediction. The forward selection method proceeds sequentially in a way that rewards a set of variables by how much information it provides regarding the outcome, but penalizes the set based on the number of variables in it. Concretely, we propose a logistic regression model for estimating the conditional expectation between the outcome and the ensemble of variables. The model obtained compares favorably with that obtained via an exhaustive search through the model space. We also observe that the set of variables selected by the forward selection procedure has better predictive power than that selected by considering their individual statistical significance. Thus we show that the forward-selection method for subset selection serves to produce a good model for predicting ad click-through-rates.