Predicting who rated what in large-scale datasets

  • Authors:
  • Yan Liu;Zhenzhen Kou

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

KDD Cup 2007 focuses on movie rating behaviors. The goal of the task "Who Rated What" is to predict whether "existing" users will review "existing" movies in the future. We cast the task as a link prediction problem and address it via a simple classification approach. Compared with other applications for link prediction, there are two major challenges in our task: (1) the huge size of the Netflix data; (2) the prediction target is complicated by many factors, such as a general decrease of interest in old movies and more tendency to review more movies by Netflix users due to the success of the internet DVD rental industries. We address the first challenge by "selective" subsampling and the second by combining information from the review scores, movie contents and graph topology effectively.