KDD Cup 2007 task 1 winner report

  • Authors:
  • Mikló/s Kurucz;Andrá/s A. Benczú/r;Tamá/s Kiss;Istvá/n Nagy;Adrienn Szabó/;Balá/zs Torma

  • Affiliations:
  • Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences;Data Mining and Web search Research Group, Informatics Laboratory/ Computer and Automation Research Institute of the Hungarian Academy of Sciences

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

KDD Cup 2007 focuses on predicting aspects of movie rating behavior. We present our prediction method for Task 1 "Who Rated What in 2006" where the goal is to predict which users rated which movies in 2006. We use the combination of the following methods, listed in the order of their accuracy: • The predicted number of ratings for each movie based on time series analysis, also using movie and DVD release dates and movie series detection by the edit distance of the titles. • The predicted number of ratings by each user by using the fact that ratings were sampled proportional to the margin. • The low rank approximation of the 0-1 matrix of known user-movie pairs with rating. • Prediction by using the movie-movie similarity matrix. • Association rules obtained by frequent sequence mining of user ratings considered as ordered itemsets. By combining the predictions by linear regression we obtained a prediction with root mean squared error 0.256. The first runner up result was 0.263 while a pure all zeroes prediction already gives 0.279, indicating the hardness of the task.