Forecasting high-dimensional data

  • Authors:
  • Deepak Agarwal;Datong Chen;Long-ji Lin;Jayavel Shanmugasundaram;Erik Vee

  • Affiliations:
  • Yahoo! Research, Santa Clara, CA, USA;Yahoo! Labs, Santa Clara, CA, USA;Yahoo! Labs, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a method for forecasting high-dimensional data (hundreds of attributes, trillions of attribute combinations) for a duration of several months. Our motivating application is guaranteed display advertising, a multi-billion dollar industry, whereby advertisers can buy targeted (high-dimensional) user visits from publishers many months or even years in advance. Forecasting high-dimensional data is challenging because of the many possible attribute combinations that need to be forecast. To address this issue, we propose a method whereby only a sub-set of attribute combinations are explicitly forecast and stored, while the other combinations are dynamically forecast on-the-fly using high-dimensional attribute correlation models. We evaluate various attribute correlation models, from simple models that assume the independence of attributes to more sophisticated sample-based models that fully capture the correlations in a high-dimensional space. Our evaluation using real-world display advertising data sets shows that fully capturing high-dimensional correlations leads to significant forecast accuracy gains. A variant of the proposed method has been implemented in the context of Yahoo!'s guaranteed display advertising system.